Export matched lines from two text files

Question

jgillis16 el 15 de Ag. de 2015

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/233997-export-matched-lines-from-two-text-files

Respondida: r r el 11 de Mayo de 2021

I need to identify the same lines between the two text files, mwithrm21.txt and virgomrmdist.txt, based on column 7 of each files. These matches should then be exported into a new text file, while removing the matched lines from mwithrm21.txt.

I have attached the text files.

I drafted the code below:

content1 = fileread( 'mwithrm21.txt' ) ;
 content2_rows = strsplit( fileread( 'virgomrmdist.txt' ), sprintf( '\n' )) ;
 found = cellfun( @(s)~isempty(strfind(content1, s)), content2_rows ) ;
 output_rows = content2_rows(found) ;
 fId = fopen( 'similarvclf.txt', 'w' ) ;
 fprintf( fId, '%s\n', output_rows{:} ) ;
 fclose( fId ) ;
 output_rows = content2_rows(~found) ;
 fId = fopen( 'mwithrm21_new.txt', 'w' ) ;  % Remove the '_new' for overwriting original. 
 fprintf( fId, '%s\n', output_rows{:} ) ;
 fclose( fId ) ;

But, I do not know how to make it specific to only searching column 7 and then exporting the entire matched line to a new text file.

6 comentarios
Mostrar 4 comentarios más antiguosOcultar 4 comentarios más antiguos

Cedric el 15 de Ag. de 2015

Editada: Cedric el 15 de Ag. de 2015

Abrir en MATLAB Online

Actually, you say that you want to match rows based on column 7 only, but when they match the other columns don't always match. What do you want to have in the output?

For example:

 file1 : 188.83785|27.56214|-14.4|18.931|0.398|~|SDSSJ123521.05+273343.6
 file2 : 188.83785|27.56214|18.931|0.398|-14.4|~|SDSSJ123521.05+273343.6

Should we export both?

jgillis16 el 15 de Ag. de 2015

I understand, which is why I worked through it (thanks for the reminder email!!).

It doesn't matter honestly, since the output material in the lines is the same, except in different order. But, since my main focus is to export lines matching in mwithrm21 to a new text file while removing the matched lines from the original mwithrm21 text file, I would like the exported lines to come from mwithrm21.txt

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Cedric el 15 de Ag. de 2015

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/233997-export-matched-lines-from-two-text-files#answer_189428

Editada: Cedric el 16 de Ag. de 2015

Abrir en MATLAB Online

Here is a first draft. Test it and let me know if anything is unclear or doesn't work.

 % - Read files content as strings.
 content1 = fileread( 'mwithrm21.txt' ) ;
 content2 = fileread( 'virgomrmdist.txt' ) ;
 % - Extract last column of each content.
 codes1 = regexp( content1, '[^|]+(?=(\s|$))', 'match' ) ;
 codes2 = regexp( content2, '[^|]+(?=(\s|$))', 'match' ) ;
 % - Matches codes.
 [isMatch_1in2, match_posIn2] = ismember( codes1, codes2 ) ;
 % - Split content. Careful, whatever generates these files still uses
 %  carriage returns (\r) only.
 rows1 = strsplit( content1, char(13) ) ;
 rows2 = strsplit( content2, char(13) ) ;
 % - Output matches (version mwithrm21.txt). Use new line chars (\n) as 
 %  joint instead of carriage returns, change if you prefer \r.
 fId = fopen( 'matches.txt', 'w' ) ;
 fwrite( fId, strjoin( rows1(isMatch_1in2), '\n' )) ;
 fclose( fId ) ;
 % - Output non-matching rows of file 1.
 fId = fopen( 'mwithrm21_reduced.txt', 'w' ) ;
 fwrite( fId, strjoin( rows1(~isMatch_1in2), '\n' )) ;
 fclose( fId ) ;
 % - Output non-matching rows of file 2. Eliminate matching rows first.
 rows2(nonzeros( match_posIn2 )) = [] ;
 fId = fopen( 'virgomrmdist_reduced.txt', 'w' ) ;
 fwrite( fId, strjoin( rows2, '\n' )) ;
 fclose( fId ) ;

EDITs :

Replaced match_posIn2(match_posIn2~=0) with nonzeros( match_posIn2 ) after reading an answer by Matt J in another thread that mentions NONZEROS.

7 comentarios
Mostrar 5 comentarios más antiguosOcultar 5 comentarios más antiguos

Cedric el 17 de Ag. de 2015

Editada: Cedric el 17 de Ag. de 2015

Abrir en MATLAB Online

Just replace the patterns in the calls to REGEXP with:

'(?<=(\s|^))[^|]+'

This means: match

One or more character (as many as possible) different from '|'.
Preceded by either a white space (which, in your case, is the carriage return of the previous line) or the beginning of the string.

PS1: if execution speed is important, you should profile Per's solution and mine, to see which one is the fastest in your specific case.

PS2: the previous patterns

'[^|]+(?=(\s|$))'

meant: match

One or more character (as many as possible) different from '|'.
Followed by either a white space (which, in your case, is the carriage return after column 7) or by the end of the string.

PS3:

[abc] is a set of characters to match (not a string, it means 'a' or 'b' or 'c').
[^abc] is a set of characters not to match.
+ means "one or more (as many as possible) what precedes".
(?<=...) is a positive look-behind for whatever is between = and ).
(?=...) is a positive look-forward for whatever is between = and ).
(..|..) is an OR operator.
\s means "white space" which includes spaces, tabs, carriage returns, new lines characters.
^ codes for the beginning of the string.
$ codes for the end of the string.

With that, you can decipher the patterns in principle.

jgillis16 el 17 de Ag. de 2015

OK! That was very helpful!!! Thanks!

Next time, I'll have a little more clue to what I need to code and maybe I might get it done myself without bugging you guys :)

Iniciar sesión para comentar.

Answer 2

per isakson el 16 de Ag. de 2015

2
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/233997-export-matched-lines-from-two-text-files#answer_189474

Editada: per isakson el 16 de Ag. de 2015

Abrir en MATLAB Online

Here is an example of a different approach to solve the task. The two output files, mwithrm21_reduced.txt and matches.txt, are identical besides the new line characters.

function    et = cssm()
%   et(1) = cssm_1();
    et(2) = cssm_2();
end
function    et = cssm_2()
    tic
    fid     = fopen( 'mwithrm21.txt', 'rt' );
    rows1   = textscan( fid, '%s', 'Delimiter','\n' );
    fseek( fid, 0, 'bof' );
    codes1  = textscan( fid, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );
    fclose( fid );
    %
    fid     = fopen( 'virgomrmdist.txt', 'rt' );
    codes2  = textscan( fid, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );
    fclose( fid );
    %
    ism = ismember( codes1{1}, codes2{1} );
    %
    fid = fopen( 'matches.txt', 'wt' );
    fprintf( fid, '%s\n', rows1{1}{ism} );
    fclose( fid ) ;
    %
    fid = fopen( 'mwithrm21_reduced.txt', 'wt' );
    fprintf( fid, '%s\n', rows1{1}{not(ism)} );
    fclose( fid );
    et = toc;
end

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

jgillis16 el 17 de Ag. de 2015

Hey thanks! I appreciate the alternative approach!

Iniciar sesión para comentar.

Answer 3

r r el 11 de Mayo de 2021

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/233997-export-matched-lines-from-two-text-files#answer_697415

I have two files in which there are numbers in the first column that are similar and I want to print the line that matches and differs in the number of the first column in the two files:

%%%%%%%%%%%%%%%%%%%%%%% Fiel.1

fid1 = fopen( 'E1.txt', 'rt' );

T1 = textscan(fid1,'%s', 'delimiter', '\n');

%codes1 = textscan( fid1, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );

fclose( fid1 );

%%%%%%%%%%%%%%%%%%%%%%%%%%Fiel.2

fid2 = fopen( 'G1.txt', 'rt' );

T2 = textscan(fid2,'%s', 'delimiter', '\n');

%codes2 = textscan( fid2, '%*s%*s%*s%*s%*s%*s%s', 'Delimiter','|' );

fclose( fid2 );

%%%%%%%%%%%%%%%%%%%%%%%%%%%

T1s = char(T1{:});

T2s = char(T2{:});

%Similar data between two files::

%[C,ix,ic] = intersect(T1s,T2s,'rows')

%Differences data between two files::

[B,ib,ib] = visdiff(T1s,T2s,'rows')

%%%%%%%%%%%%%%%%%%%%print output:::

fid = fopen( 'Similar.txt', 'wt' );%Print all similar lines

fprintf('%s\n',C)

fclose( fid ) ;

fid = fopen( 'Different.txt', 'wt' );%Print all different lines

fprintf('%s\n',B)

fclose( fid );

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Export matched lines from two text files

6 comentarios
Mostrar 4 comentarios más antiguosOcultar 4 comentarios más antiguos

Respuesta aceptada

7 comentarios
Mostrar 5 comentarios más antiguosOcultar 5 comentarios más antiguos

Más respuestas (2)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Export matched lines from two text files

6 comentarios Mostrar 4 comentarios más antiguosOcultar 4 comentarios más antiguos

Respuesta aceptada

7 comentarios Mostrar 5 comentarios más antiguosOcultar 5 comentarios más antiguos

Más respuestas (2)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

6 comentarios
Mostrar 4 comentarios más antiguosOcultar 4 comentarios más antiguos

7 comentarios
Mostrar 5 comentarios más antiguosOcultar 5 comentarios más antiguos

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos