Main Content

dimercount

Count dimers in nucleotide sequence

Syntax

Dimers = dimercount(SeqNT)
[Dimers, Percent] = dimercount(SeqNT)
... = dimercount(SeqNT, 'Ambiguous', AmbiguousValue)
... = dimercount(SeqNT, 'Chart', ChartValue)

Input Arguments

SeqNT

One of the following:

Examples: 'ACGT' or [1 2 3 4]

AmbiguousValue

Character vector or string specifying how to treat dimers containing ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, or N). Choices are:

  • 'ignore' (default) — Skips dimers containing ambiguous characters

  • 'bundle' — Counts dimers containing ambiguous characters and reports the total count in the Ambiguous field of the Dimers output structure.

  • 'prorate' — Counts dimers containing ambiguous characters and distributes them proportionately in the appropriate dimer fields containing standard nucleotide characters. For example, the counts for the dimer AR are distributed evenly between the AA and AG fields.

  • 'warn' — Skips dimers containing ambiguous characters and displays a warning.

ChartValue Character vector or string specifying a chart type. Choices are 'pie' or 'bar'.

Output Arguments

DimersMATLAB structure containing the fields AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, and TT, which contain the dimer counts in SeqNT.
PercentA 4-by-4 matrix with the relative proportions of the dimers in SeqNT. The rows correspond to A, C, G, and T in the first element of the dimer, and the columns correspond to A, C, G, and T in the second element of the dimer.

Description

Dimers = dimercount(SeqNT) counts the nucleotide dimers in SeqNT, a nucleotide sequence, and returns the dimer counts in Dimers, a MATLAB structure containing the fields AA, AC, AG, AT, CA, CC, CG, CT, GA, GC, GG, GT, TA, TC, TG, and TT.

  • For sequences that have dimers with the character U, these dimers are added to the corresponding dimers containing a T.

  • If the sequence contains gaps indicated by a hyphen (-), the gaps are ignored, and the two characters on either side of the gap are counted as a dimer.

  • If the sequence contains unrecognized characters, then dimers containing these characters are ignored, and the following warning message appears:

    Warning: Unknown symbols appear in the sequence. These will be ignored.

[Dimers, Percent] = dimercount(SeqNT) returns Percent, a 4-by-4 matrix with the relative proportions of the dimers in SeqNT. The rows correspond to A, C, G, and T in the first element of the dimer, and the columns correspond to A, C, G, and T in the second element of the dimer.

... = dimercount(SeqNT, 'Ambiguous', AmbiguousValue) specifies how to treat dimers containing ambiguous nucleotide characters. Choices are:

  • 'ignore' (default)

  • 'bundle'

  • 'prorate'

  • 'warn'

... = dimercount(SeqNT, 'Chart', ChartValue) creates a chart showing the relative proportions of the dimers. ChartValue can be 'pie' or 'bar'.

Examples

collapse all

seq = randseq(100)
seq = 
'TTATGACGTTATTCTACTTTGATTGTGCGAGACAATGCTACCTTACCGGTCGGAACTCGATCGGTTGAACTCTATCACGCCTGGTCTTCGAAGTTAGCAC'
[Dimers, Percent] = dimercount(seq)
Dimers = struct with fields:
    AA: 4
    AC: 9
    AG: 3
    AT: 6
    CA: 3
    CC: 3
    CG: 8
    CT: 9
    GA: 8
    GC: 4
    GG: 4
    GT: 6
    TA: 7
    TC: 8
    TG: 7
    TT: 10

Percent = 4×4

    0.0404    0.0909    0.0303    0.0606
    0.0303    0.0303    0.0808    0.0909
    0.0808    0.0404    0.0404    0.0606
    0.0707    0.0808    0.0707    0.1010

Version History

Introduced before R2006a