Genomic Sequences

Genomic Sequences Modules

These modules contain functions and classes for working with genomic sequences. These sequences are usually stored in FASTQ or FASTA files.

  • GSequence is a single sequence.

  • GSequences is a collection of many GSequence objects.

  • GSequencesSet is a set of many GSequences which represent different genomic elements.

class genomkit.sequences.gsequence.GSequence(sequence: str, quality: str = '', name: str = '', data: list = [])

GSequence module

This module contains functions and classes for working with a genomic sequence.

complement()

Convert the sequence into a complement sequence.

count_table()

Return a dictionary for the counting frequency of all nucleic acids.

Returns:

A count table

Return type:

dict

dna2rna()

Convert DNA sequence to RNA sequence by replacing “T” with “U”.

reverse()

Convert the sequence into a reverse sequence as well as the quality if applicable.

reverse_complement()

Convert the sequence into a reverse complement sequence as well as the quality if applicable.

rna2dna()

Convert RNA sequence to DNA sequence by replacing “U” with “T”.

slice_sequence(start, end)

Return the sequence by the given start and end positions.

Parameters:
  • start (int) – Start position

  • end (int) – End position

Returns:

Sequence

Return type:

str

trim(start: int = 0, end: int = 0)

Remove the nucleotides from the starting or ending according to the defined length.

Parameters:
  • start (int, optional) – Define the length to remove from starting, defaults to 0

  • end (int, optional) – Define the length to remove from ending, defaults to 0

class genomkit.sequences.gsequences.GSequences(name: str = '', load: str = '')

GSequences module

This module contains functions and classes for working with a collection of genomic sequences. It provides utilities for handling and analyzing the interactions of many genomic sequences.

add(sequence)

Append a GSequence at the end of the elements of GSequences.

Parameters:

sequence (GSequence) – A GSequence

complement()

Convert the sequences into complement sequences.

count_table()

Return a dictionary for the counting frequency of all nucleic acids.

Returns:

A count table

Return type:

dict

dna2rna()

Convert DNA sequences to RNA sequences by replacing “T” with “U”.

extract_seqs_by_regions(regions)

Return another GSequences according to the given GRegions.

Parameters:

regions (GRegions) – A GRegions

get_sequence(name, start, end)

Return the sequence according to the given name, start and end.

Parameters:
  • name (str) – Sequence name

  • start (int) – Start position

  • end (int) – End position

Returns:

GSequence

Return type:

GSequence

load(filename: str)

Load a FASTA/FASTQ file into the GSequences.

Parameters:

filename (str) – Path to the FASTA/FASTQ file

reverse()

Convert the sequences into reverse sequences.

reverse_complement()

Convert the sequences into reverse complement sequences.

rna2dna()

Convert RNA sequences to DNA sequences by replacing “U” with “T”.

trim(start: int = 0, end: int = 0)

Remove the nucleotides from the starting or ending according to the defined length.

Parameters:
  • start (int, optional) – Define the length to remove from starting, defaults to 0

  • end (int, optional) – Define the length to remove from ending, defaults to 0