Sort VCF by Chr and Pos根据染色体位置对VCF进行排序
对VCF文件中的突变按照染色体和位置进行排序,下面是本人的总结,其中利用bash命令的方法不依赖其他的工具或包。htslib前文中也提到过。
1, Use bash
bash raw.vcf
|
|
2,Use awk and sed
(awk ‘/^#/{print}!/^#/{exit}’ raw.vcf;sed ‘/^#/d’ raw.vcf’awk -F"\t" ‘($1~/^[0-9]+$/){sub("^chr","",$0);print $0}‘‘sort -k1,1n -k2,2n’awk ‘{print “chr”$0}’ ;sed ‘/^#/d’ raw.vcf’ awk -F"\t" ‘($1!~/^[0-9]+$/){sub("^chr","",$0);print $0}‘‘sort -k1,1d -k2,2n’awk ‘{print “chr”$0}') > sort.vcf
3, Use Picard
Sorts one or more VCF files. This tool sorts the records in VCF files according to the order of the contigs in the header/sequence dictionary and then by coordinate. It can accept an external sequence dictionary.
java -jar picard.jar SortVcf I=unsort.vcf O=sorted.vcf
4,Use vcf-sort (in vcftools)
cat file.vcf ' vcf-sort > sorted.vcf