Finding the repeated substrings

Question

Reshma Ravi el 1 de Jun. de 2017

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/342796-finding-the-repeated-substrings

Respondida: Steven Lord el 14 de Ag. de 2019

I have a DNA sequence that is AAGTCAAGTCAATCG and I split into substrings such as AAGT,AGTC,GTCA,TCAA,CAAG,AAGT and so on. Then I have to find the repeated substirngs and their frequency counts ,that is here AAGT is repeated twice so I want to get AAGT - 2.How is this possible .

2 comentarios
Mostrar NingunoOcultar Ninguno

Stephen23 el 1 de Jun. de 2017

See Andrei Bobrov's answer for an efficient solution.

Andrei Bobrov el 2 de Jun. de 2017

Thank you Stephen!

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

KSSV el 1 de Jun. de 2017

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/342796-finding-the-repeated-substrings#answer_269149

Abrir en MATLAB Online

str = {'AAGT','AGTC','GTCA','TCAA','CAAG','AAGT'} ;
idx = cellfun(@(x) find(strcmp(str, x)==1), unique(str), 'UniformOutput', false) ;
L = cellfun(@length,idx) ;
Ridx = find(L>1) ;
for i = 1:length(Ridx)
    st = str(idx{Ridx}) ;
    fprintf('%s string repeated %d times\n',st{1},length(idx{Ridx}))
end

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Answer 2

Andrei Bobrov el 1 de Jun. de 2017

2
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/342796-finding-the-repeated-substrings#answer_269150

Abrir en MATLAB Online

A = 'AAGTCAAGTCAATCG';
B = hankel(A(1:end-3),A(end-3:end));
[a,~,c] = unique(B,'rows','stable');
out = table(a,accumarray(c,1),'VariableNames',{'DNA','counts'});

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

Stephen23 el 26 de Ag. de 2018

tabulate requires the Statistics and Machine Learning Toolbox, which not everyone has.

Ivan Savelyev el 14 de Ag. de 2019

Hi.

I have a question. Some time i have a ladder-like results (nested sequences) like this :

AAAAAAAAA which will be calculated (with frame size 3 as) as 6 AAAA sequences, wich is not correct in some cases ( it is also about ATATATA type of sequences). Is there a solution or algorithms to filter nested repeats ?

Thanx a lot.

Iniciar sesión para comentar.

Answer 3

Steven Lord el 14 de Ag. de 2019

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/342796-finding-the-repeated-substrings#answer_387601

Abrir en MATLAB Online

For the original question you could convert the char data into a categorical array and call histcounts.

>> C = categorical({'AAGT','AGTC','GTCA','TCAA','CAAG','AAGT'})
C = 
  1×6 categorical array
     AAGT      AGTC      GTCA      TCAA      CAAG      AAGT 
>> [counts, uniquevalues] = histcounts(C)
counts =
     2     1     1     1     1
uniquevalues =
  1×5 cell array
    {'AAGT'}    {'AGTC'}    {'CAAG'}    {'GTCA'}    {'TCAA'}

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Finding the repeated substrings

2 comentarios
Mostrar NingunoOcultar Ninguno

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (2)

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Finding the repeated substrings

2 comentarios Mostrar NingunoOcultar Ninguno

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (2)

5 comentarios Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

2 comentarios
Mostrar NingunoOcultar Ninguno

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

5 comentarios
Mostrar 3 comentarios más antiguosOcultar 3 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos