统计GTF文件中转录本的长度 Calculate transcript length from gtf file

gtf 文件INPUT

chr1    PacBio  exon    763020  763155  .       +       .       gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "LINC01128"; oId "PB.5.1"; nearest_ref "NR_047519"; class_code "j"; tss_id "TSS1";
chr1    PacBio  exon    764383  764484  .       +       .       gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "LINC01128"; oId "PB.5.1"; nearest_ref "NR_047519"; class_code "j"; tss_id "TSS1";
chr1    PacBio  exon    776580  776753  .       +       .       gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "3"; gene_name "LINC01128"; oId "PB.5.1"; nearest_ref "NR_047519"; class_code "j"; tss_id "TSS1";

Code

awk -F"\t" '
{
if ($3=="exon")
    {
        split($9, a, " ");
        ID=a[4]; 
        L[ID]+=$5-$4+1
    } 
}
END{
    for(i in L)
        {print i"\t"L[i]}
    }
' gtf-file >output

You can change array index and feature type to choose what you want to count. For example, if you want to calculate gene length, change exon to gene (make sure your gtf file has this feature).

输出OUTPUT

"TCONS_00052437";       5950
"TCONS_00049398";       988
"TCONS_00031225";       6005
"TCONS_00026369";       825

参考:https://gist.github.com/sp00nman/10372555

发表评论

电子邮件地址不会被公开。 必填项已用*标注

此站点使用Akismet来减少垃圾评论。了解我们如何处理您的评论数据