Getting strings to combine multiple times

8 visualizaciones (últimos 30 días)
Matthew Zehner
Matthew Zehner el 28 de Abr. de 2016
Comentada: Walter Roberson el 28 de Abr. de 2016
Ok I have a school project that I have to group a DNA sequence of 550437 codons together. At the moment I have it set up as a string. Basically 1 letter per cell on 550437 cells. I have to show how many times AAA, ATC, and CGG show up in that sequence without overlap. I also have to show the location of the first 10. I've tried reshaping from a 550437x1 to a 183479x3 but the order doesn't align every third from left to right. Column 1 will have the first 183479, the second column will have the second and the third column will have the final set. I would either like to group every 3 cells into one cell, or just give me a numeric notation telling me when my selected sequence shows up. Here's what I have so far to show me how many times each sequence shows up. Now I can't figure out how to find where the first 10 instances of each show up.
x=1;
i=1;%%%Variable for AAA
h=1;%%%Variable for ATC
t=1;%%%Variable for CGG
AAAmatch=0;%%%Sets up for exact match
ATCmatch=0;%%%Sets up for exact match
CGGmatch=0;%%%Sets up for exact match
AAAcount=0;%%%Counter for AAA match
ATCcount=0;%%%Counter for ATC match
CGGcount=0;%%%Counter for CGG match
%%%Locates AAA match in entire sequence without overlap
for i=1:length(DNA)-2
if strcmp(DNA(i),'A')
AAAmatch=AAAmatch+1;
end
if strcmp(DNA(i+1),'A')
AAAmatch=AAAmatch+1;
end
if strcmp(DNA(i+2),'A')
AAAmatch=AAAmatch+1;
end
if AAAmatch==3
AAAcount=1+AAAcount;
end
AAAmatch=0;
end
%%%Locates ATC match in entire sequence without overlap
for h=1:length(DNA)-2
if strcmp(DNA(h),'A')
ATCmatch=ATCmatch+1;
end
if strcmp(DNA(h+1),'T')
ATCmatch=ATCmatch+1;
end
if strcmp(DNA(h+2),'C')
ATCmatch=ATCmatch+1;
end
if ATCmatch==3
ATCcount=1+ATCcount;
end
ATCmatch=0;
end
%%%Locates CGG match in entire sequence without overlap
for t=1:length(DNA)-2
if strcmp(DNA(t),'C')
CGGmatch=CGGmatch+1;
end
if strcmp(DNA(t+1),'G')
CGGmatch=CGGmatch+1;
end
if strcmp(DNA(t+2),'G')
CGGmatch=CGGmatch+1;
end
if CGGmatch==3
CGGcount=1+CGGcount;
end
CGGmatch=0;
end
Thoughts?
  1 comentario
Azzi Abdelmalek
Azzi Abdelmalek el 28 de Abr. de 2016
You can make your question clear and brief, by posting an example with the expected result. You can also add some explanations.

Iniciar sesión para comentar.

Respuestas (1)

Walter Roberson
Walter Roberson el 28 de Abr. de 2016
Consider using strfind() . But you do need to put in some logic to detect a potential overlap between the final character of one and the first of the next. Also if you had something like 'AAAA' then strfind() of 'AAA' will return both 1 and 2 (that is, strfind does not care about overlaps.) Still, strfind() will help give you candidate positions that you can winnow out.
What would you want the result to be if there was 'AAATCGG' in the sequence? Is that one AAA and one CGG, or is it one ATC ?
  2 comentarios
Matthew Zehner
Matthew Zehner el 28 de Abr. de 2016
Editada: Matthew Zehner el 28 de Abr. de 2016
I've tried strfind. Since I'm working with cells with a single letter in them it doesn't work. I need to figure out AAA, ATC, and CGG individually. strfind only returns a [1] if it's true or []. And I only get the true or false if I use a single letter and not the 3 letters together. I don't get a numerical output as you would if you had a normal string like DNA='ATCAAACGGATCAACGTACAGTCATAC'. That would work rather easily. But since I have an array with over half a million cells strfind just tells me if there is the letter I'm looking for or not. Doesn't tell me there number.
Walter Roberson
Walter Roberson el 28 de Abr. de 2016
horzcat(DNA{:}) and the result will be a string.

Iniciar sesión para comentar.

Categorías

Más información sobre Workspace Variables and MAT Files en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by