How to see if characters are present in a string array.

21 visualizaciones (últimos 30 días)
I am trying to write some code that will take a short amino acid sequence, ex. 'GSA' and then search through a string array of sequences to find the number and index of matches, but I would like it to ignore the order of the characters. As long as each character is present, I would like to consider it a hit.
Here is the code I have so far, which kind of works. InputSeq is the sequence I would like to search for, and AAseq is the string array of sequences that I would be searching through. This code only produces a match if all characters are present AND the order is correct.
InputSeq = "GSA";
AAseq = [ SGD; SGS; SGA; SGV; SGS; SGA; SGD; SGS; SGS; SGY; SGD; SGS; SGI.........];
result = ismember(InputSeq, AAseq)
This kind of works, but it will not register a match if the order of the characters does not match.

Respuesta aceptada

Stephen23
Stephen23 el 3 de Dic. de 2021
Editada: Stephen23 el 3 de Dic. de 2021
Assuming that all string elements contain exactly the same number of characters, then you can do this easily with basci logical operations on character arrays:
A = "GSA";
B = ["SGD";"SGS";"SGA";"SGV";"SGS";"SGA";"SGD";"SGS";"SGS";"SGY";"SGD";"SGS";"SGI"]
B = 13×1 string array
"SGD" "SGS" "SGA" "SGV" "SGS" "SGA" "SGD" "SGS" "SGS" "SGY" "SGD" "SGS" "SGI"
X = all(sort(char(A))==sort(char(B),2),2)
X = 13×1 logical array
0 0 1 0 0 1 0 0 0 0
Or without sorting:
X = all(any(char(A)==permute(char(B),[1,3,2]),3),2)
X = 13×1 logical array
0 0 1 0 0 1 0 0 0 0
  3 comentarios
Stephen23
Stephen23 el 3 de Dic. de 2021
You don't need the loop, youc an simply specify the sort dimension argument:
A = 'GSA'
A = 'GSA'
B = ['SGD';'SGS';'SGA';'SGV';'SGS';'SGA';'SGD';'SGS';'SGS';'SGY';'SGD';'SGS';'SGI']
B = 13×3 char array
'SGD' 'SGS' 'SGA' 'SGV' 'SGS' 'SGA' 'SGD' 'SGS' 'SGS' 'SGY' 'SGD' 'SGS' 'SGI'
X = all(sort(A)==sort(B,2),2)
X = 13×1 logical array
0 0 1 0 0 1 0 0 0 0
Elijah Roberts
Elijah Roberts el 3 de Dic. de 2021
Yep, you're right! That worked. Thank you!

Iniciar sesión para comentar.

Más respuestas (1)

Walter Roberson
Walter Roberson el 2 de Dic. de 2021
You could use multiple contains() tests.
But I suggest that instead you do something like
ismember(sort(char(InputSeq)), cellfun(@sort, cellstr(AAseq), 'uniform', 0))
  2 comentarios
Elijah Roberts
Elijah Roberts el 2 de Dic. de 2021
Editada: Elijah Roberts el 2 de Dic. de 2021
That is only returning true or false i.e. "InputSeq is found somewhere in AAseq." I would like to know get a logic array of the same size as AAseq, so I can get all of the indeces of the matching sequences.
I had some luck with this, I also trimmed the input sequence down to 'GS,' and the AAseq are all two characters long as well
Matches = ismember(InputSeq, AAseq); (both variables are char arrays)
This gave me a 96x2 logic array. Column one seems to be "is G a member" and column 2 is "is S a member"
This kind of works for me. If I can get the row indeces where both columns are true I will be good.
I tried this
MatchIndex = find(Matches == [1 1])
but it just gave me every index where there is a 1, rather than giving me indeces where both columns are 1.
Walter Roberson
Walter Roberson el 2 de Dic. de 2021
ismember( cellfun(@sort, cellstr(AAseq), 'uniform', 0), sort(char(InputSeq)) )
You could also strcmp()

Iniciar sesión para comentar.

Categorías

Más información sobre Shifting and Sorting Matrices en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by