随着测序数据的积累,如何复用这些数据是一个挑战。recount项目,目前是recount3,共收集了8,679人和10,088小鼠的数据集,超过750,000个样本,经过统一处理(uniformly processed),得到gene或exon的表达以及exon-exon junction的数据。好多年前,我就了解过recount项目,很奇怪很少有介绍的。

一,recount对所有属于进行了uniformly processed,避免了分析流程的bias;

二,recount提供了超大规模的预处理之后的数据,直接拿来用,避免研究人员从原始数据分析;

三,recount提供了简单易用的工具,方便研究人员下载和处理数据。

方法1:下载TCGA-OV为例,检索过滤然后下载

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
library(recount3)

# 同 https://jhubiostatistics.shinyapps.io/recount3-study-explorer/
# 可以看到project_home和project,包括TCGA,GTEX和SRA
human_projects <- available_projects()

proj_info <- subset(
  human_projects,
  project == "OV" & project_type == "data_sources"
)

rse_gene_tcga_ov <- create_rse(proj_info)

#counts data
assays(rse_gene_tcga_ov)$counts <- transform_counts(rse_gene_tcga_ov)
# ## Compute TPMs
assays(rse_gene_tcga_ov)$TPM <- recount::getTPM(rse_gene_tcga_ov, length_var = "score")
# ## Check TPM. Should all be equal to 1
colSums(assay(rse_gene_tcga_ov, "TPM")) / 1e6 


# View sample annotation
View(data.frame(colData(rse_gene_tcga_ov)))

# View gene annotation
View(data.frame(exp$tcga.ov.expr@rowRanges))

方法2:直接选中数据集,生成下载code

在这个网站选中想要下载的数据集,https://jhubiostatistics.shinyapps.io/recount3-study-explorer/,网站下方会显示下载的code。注释不一定是26,还可以是RefSeq v109,Gencode v29等。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
rse_gene_tcga_ov = recount3::create_rse_manual(
    project = "OV",
    project_home = "data_sources/tcga",
    organism = "human",
    annotation = "gencode_v26",
    type = "gene"
)

#counts data
assays(rse_gene_tcga_ov)$counts <- transform_counts(rse_gene_tcga_ov)
# ## Compute TPMs
assays(rse_gene_tcga_ov)$TPM <- recount::getTPM(rse_gene_tcga_ov, length_var = "score")
# ## Check TPM. Should all be equal to 1
colSums(assay(rse_gene_tcga_ov, "TPM")) / 1e6

下载exon-exon junction数据:

1
2
3
4
5
6
7
rse_jxn_tcga_ov = recount3::create_rse_manual(
    project = "OV",
    project_home = "data_sources/tcga",
    organism = "human",
    annotation = "gencode_v26",
    type = "jxn"
)

但好像大陆有些不能访问,下载不了。

参考:

https://www.bioconductor.org/packages/devel/bioc/vignettes/recount3/inst/doc/recount3-quickstart.html

https://rna.recount.bio/

https://rna.recount.bio/docs/bioconductor.html#recount3

####################################################################

#版权所有 转载请告知 版权归作者所有 如有侵权 一经发现 必将追究其法律责任

#Author: Jason

#####################################################################