Main Content


Check if pattern is substring in documents

Since R2022b



    tf = contains(documents,pat) returns 1 where any token of documents contains pat and returns 0 otherwise.

    tf = contains(documents,pat,IgnoreCase=flag) also specifies whether to ignore letter case when checking substrings.


    Use the contains function to check substrings of the words in documents by specifying substrings or patterns. To check entire words and n-grams in documents, use the containsWords and containsNgrams functions respectively.


    collapse all

    Create an array of tokenized documents.

    documents = tokenizedDocument([
        "an example of a short sentence" 
        "a second short sentence"]);

    Check for matches of the string "short".

    tf = contains(documents,"short")
    tf = 2x1 logical array

    Check for matches of the string "ex".

    tf = contains(documents,"ex")
    tf = 2x1 logical array

    Input Arguments

    collapse all

    Input documents, specified as a tokenizedDocument array.

    Substring or pattern to check, specified as one of these values:

    • String array

    • Character vector

    • Cell array of character vectors

    • pattern array

    If pat contains multiple substrings or patterns, then the function returns 1 if any matching substrings or patterns appear in the corresponding document.

    Option to ignore case, specified as one of the these values:

    • 0 (false) – Treat candidate matches that differ only by letter case as nonmatching.

    • 1 (true) – Treat candidate matches that differ only by letter case as matching.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64 | logical

    Version History

    Introduced in R2022b