Main Content

seqwordcount

Count number of occurrences of word in sequence

Syntax

seqwordcount(Seq, Word)

Arguments

Seq

Character vector or string containing a nucleotide or amino acid sequence. You can also enter a structure with the field Sequence.

Word

Enter a short sequence of characters.

Description

seqwordcount(Seq, Word) counts the number of times that a word appears in a sequence, and then returns the number of occurrences of that word.

If Word contains nucleotide or amino acid symbols that represent multiple possible symbols (ambiguous characters), then seqwordcount counts all matches. For example, the symbol R represents either G or A (purines). For another example, if word equals 'ART', then seqwordcount counts occurrences of both 'AAT' and 'AGT'.

Examples

seqwordcount does not count overlapping patterns multiple times. In the following example, seqwordcount reports three matches. TATATATA is counted as two distinct matches, not three overlapping occurrences.

seqwordcount('GCTATAACGTATATATAT','TATA')

ans =
     3

The following example reports two matches ('TAGT' and 'TAAT'). B is the ambiguous code for G, T, or C, while R is an ambiguous code for G and A.

seqwordcount('GCTAGTAACGTATATATAAT','BART')

ans =
     2

Version History

Introduced before R2006a