我知道一个SRP的编号,里面有我想要下载的数据,我想根据SRP编号快速下载数据,查到了Kingfisher这个工具。

https://github.com/wwood/kingfisher-download

文档:https://wwood.github.io/kingfisher-download/

安装:pip install kingfisher

主要有三个模块,get、extract、annotate

get

1
2
3
4
5
6
kingfisher get -r ERR1739691 -m ena-ascp
# 可以指定列表--run-identifiers-list
# 指定project编号SRP,--bioprojects
# 指定下载方法,--download-methods,可以指定多种,程序会一个方法一个方法的试,包括ena-ascp,ena-ftp,prefetch,aws-http,aws-cp,gcp-cp
# 指定线程数目,--download-threads
# 指定ascp需要的key路径,--ascp-ssh-key

extract

将SRA格式转成fastq或者fasta格式

1
2
3
kingfisher extract --sra SRA
# 指定输出格式--output-format-possibilities
# 指定线程数目,--threads

annotate

根据Run注释,比如碱基数目,BioSample属性等,得到metatable。

1
2
3
4
kingfisher annotate --run-identifiers ERR1739691
# 可以指定列表--run-identifiers-list
# 指定project编号SRP,--bioprojects
# 指定输出格式--output-format,包括human,csv,tsv,json,feather,parquet

示例

Download .fastq.gz files of the run ERR1739691 from the ENA, or failing that, download an .sra file from the Amazon AWA Open Data Program and then convert to FASTQ (not FASTQ.GZ), or failing that use NCBI prefetch to download and convert that to FASTQ. Output files are put into the current working directory.

$ kingfisher get -r ERR1739691 -m ena-ascp aws-http prefetch

Download a .sra from GCP using a service account key with “gcp cp”. Payment is required.

$ kingfisher get -r ERR1739691 -m gcp-cp -f sra –gcp-user-key-file sa-private-key.json –allow-paid

Download a .sra from the free AWS open data program using 8 threads for download and extraction, coverting to FASTA.

$ kingfisher get -r ERR1739691 -m aws-http -f fasta –download-threads 8

Myself

kingfisher get –bioprojects SRP**** –download-methods ena-ascp –ascp-ssh-key ~/miniconda3/envs/download/etc/asperaweb_id_dsa.openssh