Contenido principal

getGenes

Return table of unique genes in GTFAnnotation object

Description

genes = getGenes(AnnotObj) returns genes, a table of genes referenced by exons in AnnotObj.

genes = getGenes(AnnotObj,"Reference",R) returns one or more genes that belong to one or more references specified by R.

example

genes = getGenes(AnnotObj,"Gene",G) returns one or more genes specified by G.

genes = getGenes(AnnotObj,"Transcript",T) returns one or more genes that contains one or more transcripts specified by T.

Examples

collapse all

Create a GTFAnnotation object from a GTF-formatted file.

obj = GTFAnnotation("hum37_2_1M.gtf");

Retrieve unique reference names. In this case, there is only one reference sequence, which is chromosome 2 (chr2).

ref = getReferenceNames(obj)
ref = 1×1 cell array
    {'chr2'}

Get a table of all genes which belong to chr2.

genes = getGenes(obj,"Reference",ref)
genes=28×7 table
        GeneID         GeneName     Reference    Start      Stop     Strand    NumTranscripts
    ______________    __________    _________    ______    ______    ______    ______________

    {'uc010yim.1'}    {0×0 char}      chr2        41609     46385      -             1       
    {'uc002qvu.2'}    {0×0 char}      chr2       218138    249852      -             1       
    {'uc002qvv.2'}    {0×0 char}      chr2       218138    256690      -             1       
    {'uc002qvw.2'}    {0×0 char}      chr2       218138    260702      -             1       
    {'uc002qvx.2'}    {0×0 char}      chr2       218138    264068      -             1       
    {'uc002qvy.2'}    {0×0 char}      chr2       218138    264068      -             1       
    {'uc002qvz.2'}    {0×0 char}      chr2       218138    264392      -             1       
    {'uc002qwa.2'}    {0×0 char}      chr2       218138    264743      -             1       
    {'uc010ewe.2'}    {0×0 char}      chr2       218138    264810      -             1       
    {'uc002qwb.2'}    {0×0 char}      chr2       239563    242178      -             1       
    {'uc002qwc.1'}    {0×0 char}      chr2       243503    262786      -             1       
    {'uc002qwd.2'}    {0×0 char}      chr2       264869    272481      +             1       
    {'uc002qwe.3'}    {0×0 char}      chr2       264869    273148      +             1       
    {'uc002qwg.2'}    {0×0 char}      chr2       264869    278280      +             1       
    {'uc002qwh.2'}    {0×0 char}      chr2       264869    278280      +             1       
    {'uc002qwf.2'}    {0×0 char}      chr2       264869    278280      +             1       
      ⋮

Input Arguments

collapse all

GTF annotation, specified as a GTFAnnotation object.

Names of reference sequences, specified as a character vector, string, string vector, cell array of character vectors, or categorical array.

The names must come from the Reference field of AnnotObj. If a name does not exist, the function provides a warning and ignores it.

Data Types: char | string | cell | categorical

Names of genes, specified as a character vector, string, string vector, cell array of character vectors, or categorical array.

The names must come from the Gene field of AnnotObj. If a name does not exist, the function provides a warning and ignores the name.

Data Types: char | string | cell | categorical

Names of transcripts, specified as a character vector, string, string vector, cell array of character vectors, or categorical array.

The names must come from the Transcript field of AnnotObj. If a name does not exist, the function gives a warning and ignores the name.

Data Types: char | string | cell | categorical

Output Arguments

collapse all

Genes referenced by exons in AnnotObj, returned as a table. The table contains the following variables for each gene.

Variable NameDescription
GeneIDCell array of character vectors containing gene IDs as listed in AnnotObj, obtained from the Gene field of AnnotObj.
GeneNameCell array of character vectors containing gene names, obtained from the Attributes field of AnnotObj. This cell array can contain empty character vectors if the corresponding gene names are not found in Attributes.
ReferenceCategorical array representing the names of reference sequences to which the genes belong, obtained from the Reference field of AnnotObj.
StartStart location of the first exon in each gene.
StopStop location of the last exon in each gene.
StrandCategorical array containing the strand of each gene.
NumTranscriptsInteger array listing the number of transcripts in each gene.

Version History

Introduced in R2014b