标签归档:SRA

用SRA-Explorer下载测序数据

下载数据的时候,偶然碰到了SRA-Explorer,感觉挺好用的,地址:https://sra-explorer.info/

这个页面本身非常小,见https://github.com/ewels/sra-explorer,利用的是SRA API。

检索好之后,选择你想下载的样本,点击Add ** to collection,然后点击右上角saved datasets,页面下方就可以原始的fastq的链接,用curl下载fastq的命令,用aspera下载fastq的命令,还有下载SRA的命令,以及样本的metadata。非常好用。

和SRA的Run Selector类似,https://www.ncbi.nlm.nih.gov/Traces/study/?

#####################################################################
#版权所有 转载请告知 版权归作者所有 如有侵权 一经发现 必将追究其法律责任
#Author: Jason
#####################################################################

How to extract paired-end reads from SRA files(reprint)

SRA(NCBI) stores all the sequencing run as single “sra” or “lite.sra” file. You may want separate files if you want to use the data from paired-end sequencing. When I run SRA toolkit’s “fastq-dump” utility on paired-end sequencing SRA files, sometimes I get only one files where all the mate-pairs are stored in one file rather than two or three files. The solution for the problem is to always run fastq-dump with “–split-3” option. If the experiment is single-end sequencing, only one fastq file will be generated. If it is paired-end sequencing, there may be two or three fastq files. Two files (with suffix “1” and “2”) are matched mate-pair read file where as the third one (without any suffix) contains all the reads that do not have any mate-paires (or SRA couldn’t resolve mate-paires for them).

Hope my experiences with NCBI SRA data handling help the readership.

source:

SRA toolkit document:

An example command:

./fastq-dump.2 –split-3 ~/Desktop/ERR068552.sra -O ~/Desktop/temp.fastaq/

PS: 如果测序使用的是ligation,结果为2 base encoding的color-space reads,可以加入-B选项使得fastq中的序列为base space reads