月度归档:2020年10月

Parse gtf

I always use gtf file and retrieve gene information. There isn’t a highly flexible tool to solve my demand. I modified the code from “https://github.com/Jverma/GFF-Parser”, thanks Jverma. This tool will be easier to use.

/wp/f4w/2020/2020-10-02-gtf-parser.gif

Usage

Basically, there are three parameters.

id: either transcript id or gene id.

attType: attribute defined in gtf file. E.g. feature (column 3), gene_name, exon_number, transcript_id in column 9

attValue: the attribute value you want to search for.

>>> import sys
>>> from gtfParser import gtfParser

>>> gtf = gtfParser("example.gtf")

>>> # Get all exons in CDK7
>>> gtf.getRecordsByID("CDK7", "feature", "exon")

>>> # Get all features of transcript_id defined as "NM_001324069" in gene "CDK7"
>>> gtf.getRecordsByID("CDK7", "transcript_id", "NM_001324069")

>>> # Get start codon where feature was defined as "start_codon" in transcript "NM_001324069"
>>> gtf.getRecordsByID("NM_001324069", "feature", "start_codon")

>>> # Get a exon where its id is "NM_001324078.1" in "NM_001324078" transcript
>>> gtf.getRecordsByID("NM_001324078", "exon_id", "NM_001324078.1")

# Example gtf
Here is an simple example of gtf file. You can use to test. A subset from refSeq.hg38.gtf.

继续阅读