How do i use textscan to extract some of the numbers with certain pattern (not all the numbers) from one sentence in text file?
6 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Shiyu Yang
el 21 de Mzo. de 2021
Comentada: Mathieu NOE
el 23 de Mzo. de 2021
New to Matlab and really struggling on this. I am trying to extract some numbers at certain location or with patterns from the sentences in a text file, but i have no clue of how to filter out other numbers. I only know how to extract all the numbers from it. For example, i have the data in text file named ABC.txt to be:
TJX was in top-20 3 times and got higher 2 times within 1 day(s), 66.67%. It went 6.23% higher on average
TJX was in top-100 32 times and got higher 22 times within 1 day(s), 68.75%. It went 2.80% higher on average
TJX was in top-200 56 times and got higher 43 times within 1 day(s), 76.79%. It went 2.63% higher on average
Your choice on 2021-03-19: TJX(-) 1599/1962
Your choice on 2021-03-18: TJX(-) 1365/2029
Your choice on 2021-03-17: TJX(+) 497/1898
Your choice on 2021-03-16: TJX(-) 1721/1973
Your choice on 2021-03-15: TJX(+) 369/2039
Your choice: AMT since 2020-01-14
AMT was in top-20 1 times and got higher 0 times within 1 day(s), 0.00%. It went 0.00% higher on average
AMT was in top-100 11 times and got higher 8 times within 1 day(s), 72.73%. It went 1.31% higher on average
AMT was in top-200 20 times and got higher 16 times within 1 day(s), 80.00%. It went 2.03% higher on average
Your choice on 2021-03-19: AMT(+) 437/1962
Your choice on 2021-03-18: AMT(N) 1818/2029
Your choice on 2021-03-17: AMT(-) 1738/1898
Your choice on 2021-03-16: AMT(-) 1807/1973
Your choice on 2021-03-15: AMT(N) 259/2039
And i want to extract all the informaion underlined above (those are done by myself manually) and get the sorted data to be like this in a text file named ABC_Reduced.txt:
TJX.. 20:03/67% 100:32/69% 200:56/77% 0319(-)0318(-)0317(+)0316(-)0315(+)
AMT.. 20:01/00% 100:11/73% 200:20/80% 0319(+)0318(N)0317(-)0316(-)0315(N)
Any help or hint would be appreciated.
Thanks,
Esther
0 comentarios
Respuesta aceptada
Mathieu NOE
el 22 de Mzo. de 2021
hello esther
that was not my easiest code of the day, but finally managed to get it done !
attached my input / output text files
hope it helps !
clc
clearvars
Filename_in = 'dataABC.txt';
Filename_out= 'dataABC_reduced.txt';
[Names,str_all] = extract_data(Filename_in)
% export to text file
writecell(str_all, Filename_out, "FileType", "text");
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function [Names,str_all] = extract_data(Filename)
fid = fopen(Filename);
tline = fgetl(fid);
% initialization
k = 0;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 1st loop to collect all the names
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
while ischar(tline)
if contains(tline,'was in top')
k = k+1; % loop over line index
Name{k} = deblank(extractBefore(tline,'was in top'));
end
tline = fgetl(fid); % lower make matlab not case sensitive
end
Names = unique(Name,'stable');
Names = Names';
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% 2nd loop to do the hard work
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% initialization
k = 0;
q = 0;
fid = fopen(Filename);
tline = fgetl(fid);
str1 = [];
str2 = [];
Name_old ='bbb';
row = 1;
while ischar(tline)
% retrieve line
if contains(tline,'was in top') % lower make matlab not case sensitive
k = k+1; % loop over line index
Name = deblank(extractBefore(tline,'was in top'));
if k>1 && strcmp(Name,Name_old) == 0
str_all{row} = [Names{row} '..' str1 ' ' str2]; % first concatenations (last one is done at the very end of the file)
str1 = []; % reset
str2 = []; % reset
row = row+1; % increment
end
% retrieve all numeraical contents
x = regexp(tline, '.*?(\d+(\.\d+)*)', 'tokens' );
A = [x{:}];
str1 = [str1 ' ' A{1} ':' A{2} '/' num2str(round(str2num(A{5}))) '%'];
end
if contains(tline,'Your choice on ') % lower make matlab not case sensitive
q = q+1; % loop over line index
date = extractBetween(tline,'Your choice on',':');
month = extractBetween(date,'-','-');
tmp = extractAfter(date,'-');
day = extractAfter(tmp,'-');
sign = extractBetween(tline,'(',')');
str2 = [str2 char(month) char(day) '(' char(sign) ')'];
end
Name_old = Name; % for the check of name change (increment row index)
tline = fgetl(fid); % lower make matlab not case sensitive
end
% last and final concatenation
str_all{row} = [Names{row} '..' str1 ' ' str2]; % last and final concatenation
str_all = str_all';
fclose(fid);
end
2 comentarios
Más respuestas (0)
Ver también
Categorías
Más información sobre Large Files and Big Data en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!