Genomic Annotation

Genomic Annotation Modules

These modules contain functions and classes for working with genomic annotation. It provides utilities for handling and extracting genomic annotations from GTF or GFF files.

class genomkit.annotation.gannotation.GAnnotation(file_path: str, file_format: str)

GAnnotation module

This module contains functions and classes for working with genomic annotation files in the format of gtf, gtf.gz, gff, or gff.gz.

filter_elements(element_type, attribute=None, value=None)

Filter elements (genes, transcripts, exons) based on attribute criteria.

Parameters:
  • element_type – Type of elements to filter (‘gene’, ‘transcript’, ‘exon’).

  • attribute – Attribute to filter on (e.g., ‘biotype’).

  • value – Value of the attribute to filter on.

Returns:

List of filtered elements.

get_exon(exon_id)

Get the annotation of an exon by exon id.

Parameters:

exon_id (str) – Define exon id.

Returns:

annotation of an exon

Return type:

dict

get_exon_ids()

Return all exon ids in the annotation.

Returns:

A list of all exon ids

Return type:

list

get_gene(gene_id: str)

Get the annotation of a gene by gene id.

Parameters:

gene_id (str) – Define gene id.

Returns:

annotation of a gene

Return type:

dict

get_gene_ids()

Return all gene ids in the annotation.

Returns:

A list of all gene ids

Return type:

list

get_gene_names()

Return all gene names in the annotation.

Returns:

A list of all gene names

Return type:

list

get_regions(element_type: str, attribute: str | None = None, value=None)

Return GRegions according to the filtering method.

Parameters:
  • element_type (str) – gene, transcript, or exon

  • attribute (str, optional) – Attribute for filtering such as ‘chr’, ‘start’, ‘end’, ‘strand’, ‘gene_name’, ‘gene_type’, defaults to None

  • value (str or int, optional) – Value of the attribute, defaults to None

Returns:

GRegions

Return type:

GRegions

get_transcript(transcript_id)

Get the annotation of a transcript by transcript id.

Parameters:

transcript_id (str) – Define transcript id.

Returns:

annotation of a transcript

Return type:

dict

get_transcript_ids()

Return all transcript ids in the annotation.

Returns:

A list of all transcript ids

Return type:

list

load_data()

Load the file.