对VCF文件中的突变按照染色体和位置进行排序,下面是本人的总结,其中利用bash命令的方法不依赖其他的工具或包。htslib前文中也提到过。
1, Use bash
bash raw.vcf
1
2
3
4
5
6
7
8
9
|
chr_order="chrMnchr1nchr2nchr3nchr4nchr5nchr6nchr7nchr8nchr9nchr10nchr11nchr12nchr13nchr14nchr15nchr16nchr17nchr18nchr19nchr20nchr21nchr22nchrXnchrY"
cat "$1" ' grep "^#" > .header.vcf
cat "$1" ' grep -v "^#" ' sort -k1,1 -k2,2n > .pre.sorted.vcf
echo -e $chr_order ' while read line
do
cat .pre.sorted.vcf ' grep "^$line"$'t' >> .header.vcf
done
mv .header.vcf sorted.vcf && rm .header.vcf .pre.sorted.vcf
|
2,Use awk and sed
(awk ‘/^#/{print}!/^#/{exit}’ raw.vcf;sed ‘/^#/d’ raw.vcf’awk -F"\t" ‘($1~/^[0-9]+$/){sub("^chr","",$0);print $0}‘‘sort -k1,1n -k2,2n’awk ‘{print “chr”$0}’ ;sed ‘/^#/d’ raw.vcf’ awk -F"\t" ‘($1!~/^[0-9]+$/){sub("^chr","",$0);print $0}‘‘sort -k1,1d -k2,2n’awk ‘{print “chr”$0}') > sort.vcf
3, Use Picard
Sorts one or more VCF files. This tool sorts the records in VCF files according to the order of the contigs in the header/sequence dictionary and then by coordinate. It can accept an external sequence dictionary.
java -jar picard.jar SortVcf I=unsort.vcf O=sorted.vcf
cat file.vcf ' vcf-sort > sorted.vcf