gtf 文件INPUT
1
2
3
|
chr1 PacBio exon 763020 763155 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "LINC01128"; oId "PB.5.1"; nearest_ref "NR_047519"; class_code "j"; tss_id "TSS1";
chr1 PacBio exon 764383 764484 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "LINC01128"; oId "PB.5.1"; nearest_ref "NR_047519"; class_code "j"; tss_id "TSS1";
chr1 PacBio exon 776580 776753 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "3"; gene_name "LINC01128"; oId "PB.5.1"; nearest_ref "NR_047519"; class_code "j"; tss_id "TSS1";
|
Code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
awk -F"\t" '
{
if ($3=="exon")
{
split($9, a, " ");
ID=a[4];
L[ID]+=$5-$4+1
}
}
END{
for(i in L)
{print i"\t"L[i]}
}
' gtf-file >output
|
You can change array index and feature type to choose what you want to count. For example, if you want to calculate gene length, change exon to gene (make sure your gtf file has this feature).
输出OUTPUT
1
2
3
4
|
"TCONS_00052437"; 5950
"TCONS_00049398"; 988
"TCONS_00031225"; 6005
"TCONS_00026369"; 825
|
参考:https://gist.github.com/sp00nman/10372555