searching a given line in a text file

3 visualizaciones (últimos 30 días)
Ram
Ram el 28 de Feb. de 2011
The following file is a txt file in sdf format(chemical structures) It looks sumthin lik this
7 9 1 0 0 0 0
7 14 1 0 0 0 0
8 10 1 0 0 0 0
8 15 1 0 0 0 0
9 10 2 0 0 0 0
9 16 1 0 0 0 0
10 17 1 0 0 0 0
12 13 1 0 0 0 0
13 18 1 0 0 0 0
13 19 1 0 0 0 0
13 20 1 0 0 0 0
M END
> <PUBCHEM_COMPOUND_CID>
2244
> <PUBCHEM_COMPOUND_CANONICALIZED>
1
> <PUBCHEM_CACTVS_COMPLEXITY>
212
I need to extract just the information under the CID number field and there could be multiple CID number fields in a single file.. How should I go about this?? Any help would be appreciated..

Respuesta aceptada

Ram
Ram el 1 de Mzo. de 2011
I tried sumthin lik this
[A,B]=uigetfile('*.sdf','sdf');
C=fopen(A,'r');
n=0;
i=<ui>; %number of structures -- wil be obtained from the user
pubchem_id=[];
z=<ui>*300; %rough approximation-- 300lines for each structure
for j=1:1:z
D=fgetl(C);
if strcmp('> <PUBCHEM_COMPOUND_CID>',D)
E=fgetl(C);
E = str2double(E);
pubchem_id=[pubchem_id; E]
end
end
and it worked :)
  2 comentarios
David Young
David Young el 1 de Mzo. de 2011
The for loop that looks at 300 lines only is a hostage to fortune: what if there are more than 300 lines for a structure? You could avoid this by using a while loop that kept looking until it either found a particular line, or came to the end of the file, and that would be far more robust.
Ram
Ram el 4 de Mzo. de 2011
I din use while loop because there is no such thing in an sdf that marks the end of the file.. lik for instance $$$$ marks the end of each structure and there could be multiple $$$$'s depending on the number of structures.. a structure averagely has about 180 lines so 300 is actually redundant and when thr are more 300 lines it wil be compensated by the ones that have less than 300..

Iniciar sesión para comentar.

Más respuestas (1)

Walter Roberson
Walter Roberson el 28 de Feb. de 2011
Not much you can do except fgetl() through the file until you encounter the M END line, and do the extraction work from there. The ease of extracting after that would depend upon the regularity of the data after that and upon which fields you were interested in.
  1 comentario
Ram
Ram el 1 de Mzo. de 2011
thank u so much:) i have built my code based on ur reply only :)

Iniciar sesión para comentar.

Categorías

Más información sobre Workspace Variables and MAT Files en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by