How to search a string with multiple rows for text?

Question

0 votos

Hello, After running seq=getgenpept('NP_036795'); . I want to search seq.Features for some text value 'Protein' . I have been unable to find the correct function to search a string with multiple rows.

Running: k=strfind(seq.Features,'Protein') results with "Error using strfind. Input strings must have one row."

Any thoughts? Best, Joe

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

per isakson el 27 de Mzo. de 2015

Editada: per isakson el 27 de Mzo. de 2015

Abrir en MATLAB Online

Excerpt from doc of getgenpept

Features: [40x64 char]

strfind cannot handle multi-row character arrays.

What does this array of characters look like? &nbsp BTW: it's allowed to use for-loops.

Luuk van Oosten el 28 de Mzo. de 2015

Looks like the pic below.

What kind of info are you trying to extract from 'Protein'?

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Follow Question

Answer 1

per isakson el 28 de Mzo. de 2015

Editada: per isakson el 29 de Mzo. de 2015

Abrir en MATLAB Online

0 votos

I guess this block of characters is easier to read on screen than to read and parse automatically. "find the correct function" I don't think there is the function; a small program is needed. Anyhow, the script below creates a structure, sas, which is a start

    %%Create test data. (The OCR-program missed most of the underscore.)
    buf = { 'source   1..116                                                '
            '         /organism="Rattus norvegicus"                         '
            '         /dbxref="taxon: 10116^                                '
            '         /chromosome=^10^                                      '
            '         /map="10824"                                          '
            'Protein  1..116                                                '
            '         /product="vesicle-associated membrane protein 2^      '
            '         /note="VAMP-2; synaptobrevin-2; Synaptobrevin 2       '
            '         (vesicle-associated membrane protein VAMP-2);         '
            '         Vesicle-associated membrane protein (synaptobrevin 2)"'
            '         /calculated mol wt=12560                              '
            'Region   28..101                                               '
            '         /region name="Synaptobrevin"                          '
            '         /note="Synaptobrevin; pfam00957"                      '
            '         /dbxref="CDD:250253"                                  '
            'Site     95..114                                               '
            '         /site type="transmembrane region"                     '
            '         /inference="non-experimental evidence, no additional  '
            '         details recorded"                                     '
            '         /note="propagated from UniProt./Swiss-Prot (P63045.2).'
            'CDS      1..116                                                '
            '         /gene="Vamp2^                                         '
            '         /gene synonym="RATVAMPB; RATVAMPIR; SYS; Syb2^        '
            '         /coded by="NM 012663.2:83..433"                       '
            '         /dbxref="GeneID:24803^                                '
            '         /dbxref="RGD:3949"                                    '};
    str_array = char( buf );
    %%read and parse
    for rr = 1 : size( str_array, 1 )
        % search rows starting with a word and followed by digits, two ".", digits
        buf = regexp( str_array(rr,:), '^(\w+)\s+(\d+\.{2}\d+)', 'tokens' );
        if not( isempty( buf ) )
            field_name = buf{1}{1};
            sas.(field_name) = buf{1}(2); 
        else
            sas.(field_name) = cat( 1, sas.(field_name)         ...
                                ,   strtrim( str_array(rr,:) )  );
        end
    end

The structure, sas, has one field for each sub-group

    >> sas
    sas = 
         source: {5x1 cell}
        Protein: {6x1 cell}
         Region: {4x1 cell}
           Site: {4x1 cell}
            CDS: {6x1 cell}
    >> sas.Protein
    ans = 
        '1..116'
        '/product="vesicle-associated membrane protein 2^'
        '/note="VAMP-2; synaptobrevin-2; Synaptobrevin 2'
        '(vesicle-associated membrane protein VAMP-2);'
        'Vesicle-associated membrane protein (synaptobrevin 2)"'
        '/calculated mol wt=12560'
    >> char( sas.Protein )
    ans =
    1..116                                                
    /product="vesicle-associated membrane protein 2^      
    /note="VAMP-2; synaptobrevin-2; Synaptobrevin 2       
    (vesicle-associated membrane protein VAMP-2);         
    Vesicle-associated membrane protein (synaptobrevin 2)"
    /calculated mol wt=12560                              
    >>

Next step is to parse the sub-blocks.

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Iniciar sesión para comentar.

How to search a string with multiple rows for text?

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Respuestas (1)

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Productos

Etiquetas

Community Treasure Hunt

How to search a string with multiple rows for text?

3 comentarios Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

Respuestas (1)

0 comentarios Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos

Categorías

Productos

Etiquetas

Ver también

Community Treasure Hunt

3 comentarios
Mostrar 1 comentario más antiguo Ocultar 1 comentario más antiguo

0 comentarios
Mostrar -2 comentarios más antiguos Ocultar -2 comentarios más antiguos