随着测序数据的积累,如何复用这些数据是一个挑战。recount项目,目前是recount3,共收集了8,679人和10,088小鼠的数据集,超过750,000个样本,经过统一处理(uniformly processed),得到gene或exon的表达以及exon-exon junction的数据。好多年前,我就了解过recount项目,很奇怪很少有介绍的。
一,recount对所有属于进行了uniformly processed,避免了分析流程的bias;
二,recount提供了超大规模的预处理之后的数据,直接拿来用,避免研究人员从原始数据分析;
三,recount提供了简单易用的工具,方便研究人员下载和处理数据。
方法1:下载TCGA-OV为例,检索过滤然后下载
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
library(recount3)
# 同 https://jhubiostatistics.shinyapps.io/recount3-study-explorer/
# 可以看到project_home和project,包括TCGA,GTEX和SRA
human_projects <- available_projects()
proj_info <- subset(
human_projects,
project == "OV" & project_type == "data_sources"
)
rse_gene_tcga_ov <- create_rse(proj_info)
#counts data
assays(rse_gene_tcga_ov)$counts <- transform_counts(rse_gene_tcga_ov)
# ## Compute TPMs
assays(rse_gene_tcga_ov)$TPM <- recount::getTPM(rse_gene_tcga_ov, length_var = "score")
# ## Check TPM. Should all be equal to 1
colSums(assay(rse_gene_tcga_ov, "TPM")) / 1e6
# View sample annotation
View(data.frame(colData(rse_gene_tcga_ov)))
# View gene annotation
View(data.frame(exp$tcga.ov.expr@rowRanges))
|
方法2:直接选中数据集,生成下载code
在这个网站选中想要下载的数据集,https://jhubiostatistics.shinyapps.io/recount3-study-explorer/,网站下方会显示下载的code。注释不一定是26,还可以是RefSeq v109,Gencode v29等。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
rse_gene_tcga_ov = recount3::create_rse_manual(
project = "OV",
project_home = "data_sources/tcga",
organism = "human",
annotation = "gencode_v26",
type = "gene"
)
#counts data
assays(rse_gene_tcga_ov)$counts <- transform_counts(rse_gene_tcga_ov)
# ## Compute TPMs
assays(rse_gene_tcga_ov)$TPM <- recount::getTPM(rse_gene_tcga_ov, length_var = "score")
# ## Check TPM. Should all be equal to 1
colSums(assay(rse_gene_tcga_ov, "TPM")) / 1e6
|
下载exon-exon junction数据:
1
2
3
4
5
6
7
|
rse_jxn_tcga_ov = recount3::create_rse_manual(
project = "OV",
project_home = "data_sources/tcga",
organism = "human",
annotation = "gencode_v26",
type = "jxn"
)
|
但好像大陆有些不能访问,下载不了。
参考:
https://www.bioconductor.org/packages/devel/bioc/vignettes/recount3/inst/doc/recount3-quickstart.html
https://rna.recount.bio/
https://rna.recount.bio/docs/bioconductor.html#recount3
####################################################################
#版权所有 转载请告知 版权归作者所有 如有侵权 一经发现 必将追究其法律责任
#Author: Jason
#####################################################################