Import data from a bad format

2 visualizaciones (últimos 30 días)
J T
J T el 10 de Mayo de 2023
Comentada: dpb el 11 de Mayo de 2023
Hello, I have a set of data and they were saved in a bad format (basically were saved from Python with lists of numpy arrays)
An example data file look like this, each file is supposed to be import into MATLAB as a matrix, where contents in eac [...] goes into each row, for as many row as the number of [...] the file contains. I am having trouble to import these, and it is too expensive to regenerate these data. Could anyone help me please?
*Note: I attached a zip file of an example data file .dat
*Note: I also converted an example from the source data from .dat to .txt to upload here
[0.01643466 0.014102 0.00989389 0.00854453 0.00811339 0.00641578
0.00615053 0.00540413 0.00452342 0.00427268 0.0041174 0.00352849
0.00273467 0.00265508 0.00239323 0.00225965 0.00199268 0.00180934
0.00174052 0.00154865 0.00143824 0.00140056 0.00130063 0.00111959
0.00085831]
[0.01242517 0.00959429 0.00663475 0.00480379 0.0041159 0.00370299
0.00346792 0.00315736 0.00289833 0.00248943 0.00233303 0.00205719
0.00184254 0.0016187 0.00137933 0.00123405 0.00114122 0.00100773
0.00094038 0.00088898 0.00078643 0.00077108 0.0006717 0.00062967
0.00058109]
[ 2.71704623e-03 2.10584618e-03 8.72114136e-04 7.73112590e-04
5.71653378e-04 5.33790412e-04 3.39630885e-04 2.40184459e-04
1.30327127e-04 8.07570547e-05 4.93676189e-05 3.99133858e-05
-6.96552090e-05 -8.84689362e-05 -1.73745252e-04 -1.92295775e-04
-2.88978292e-04 -3.33804546e-04 -4.48600012e-04 -5.03108816e-04
-6.09854318e-04 -6.76489121e-04 -7.41927073e-04 -8.22272102e-04
-1.01214861e-03]
[ 2.48950496e-03 1.32848678e-03 7.77518243e-04 4.46048853e-04
1.82546718e-04 5.68524734e-05 -2.03947611e-05 -1.22789817e-04
-1.42331199e-04 -2.27905262e-04 -2.54901789e-04 -3.21797964e-04
-4.10908018e-04 -4.31102320e-04 -5.76116105e-04 -6.20647464e-04
-6.61513106e-04 -8.03798804e-04 -8.85422390e-04 -9.60254905e-04
-1.05730808e-03 -1.21679564e-03 -1.29680491e-03 -1.65221752e-03
-1.89191346e-03]
[0.01148437 0.00831067 0.00569898 0.00435051 0.00369133 0.00313336
0.00282179 0.00252201 0.00221526 0.0020089 0.00178797 0.00135555
0.00117 0.00106878 0.00099295 0.00081433 0.00073677 0.00068778
0.00068557 0.00063079 0.00057153 0.00053233 0.0004835 0.00046683
0.00042318]
[0.01074849 0.00739927 0.00473212 0.00377076 0.00318848 0.00255984
0.00228395 0.00197474 0.00166971 0.00144228 0.00128842 0.00088904
0.00081689 0.00072367 0.00064738 0.00060256 0.00053549 0.00049838
0.00046984 0.00042499 0.0003706 0.00034885 0.00028414 0.0002643
0.00023334]
........
  4 comentarios
J T
J T el 10 de Mayo de 2023
@Walter Roberson each [] is splitted into multiple lines (depends on the number format?), but yes it is always 25 entries in one [] group
J T
J T el 10 de Mayo de 2023
@Walter Roberson It appears that in decimal format the 25 entries are splitted into 5 rows, and in scientific format, splitted into 7 lines

Iniciar sesión para comentar.

Respuesta aceptada

Stephen23
Stephen23 el 11 de Mayo de 2023
Editada: Stephen23 el 11 de Mayo de 2023
TEXTSCAN is very efficient, and imports numeric data as numeric (i.e. no fiddling around with text):
fmt = repmat('%f',1,25);
fid = fopen('example.txt');
out = textscan(fid,fmt,'EndOfLine',']','Whitespace',' \b\t\r\n[', 'CollectOutput',true);
fclose(fid);
mat = out{1}
mat = 250×25
0.0164 0.0141 0.0099 0.0085 0.0081 0.0064 0.0062 0.0054 0.0045 0.0043 0.0041 0.0035 0.0027 0.0027 0.0024 0.0023 0.0020 0.0018 0.0017 0.0015 0.0014 0.0014 0.0013 0.0011 0.0009 0.0124 0.0096 0.0066 0.0048 0.0041 0.0037 0.0035 0.0032 0.0029 0.0025 0.0023 0.0021 0.0018 0.0016 0.0014 0.0012 0.0011 0.0010 0.0009 0.0009 0.0008 0.0008 0.0007 0.0006 0.0006 0.0115 0.0083 0.0057 0.0044 0.0037 0.0031 0.0028 0.0025 0.0022 0.0020 0.0018 0.0014 0.0012 0.0011 0.0010 0.0008 0.0007 0.0007 0.0007 0.0006 0.0006 0.0005 0.0005 0.0005 0.0004 0.0107 0.0074 0.0047 0.0038 0.0032 0.0026 0.0023 0.0020 0.0017 0.0014 0.0013 0.0009 0.0008 0.0007 0.0006 0.0006 0.0005 0.0005 0.0005 0.0004 0.0004 0.0003 0.0003 0.0003 0.0002 0.0103 0.0073 0.0041 0.0034 0.0030 0.0026 0.0023 0.0019 0.0016 0.0014 0.0011 0.0008 0.0007 0.0006 0.0006 0.0005 0.0004 0.0004 0.0003 0.0003 0.0003 0.0003 0.0002 0.0002 0.0002 0.0103 0.0072 0.0039 0.0032 0.0028 0.0026 0.0022 0.0019 0.0016 0.0014 0.0010 0.0008 0.0008 0.0006 0.0006 0.0005 0.0004 0.0004 0.0003 0.0003 0.0003 0.0002 0.0002 0.0002 0.0001 0.0105 0.0071 0.0037 0.0027 0.0026 0.0023 0.0021 0.0018 0.0015 0.0013 0.0010 0.0008 0.0008 0.0006 0.0005 0.0005 0.0004 0.0003 0.0003 0.0003 0.0002 0.0002 0.0002 0.0002 0.0001 0.0108 0.0070 0.0035 0.0026 0.0024 0.0022 0.0019 0.0016 0.0013 0.0012 0.0009 0.0008 0.0007 0.0006 0.0005 0.0004 0.0004 0.0003 0.0003 0.0003 0.0002 0.0002 0.0002 0.0001 0.0001 0.0110 0.0068 0.0035 0.0026 0.0023 0.0022 0.0018 0.0014 0.0012 0.0010 0.0009 0.0008 0.0007 0.0006 0.0005 0.0004 0.0004 0.0003 0.0003 0.0002 0.0002 0.0002 0.0001 0.0001 -0.0000 0.0111 0.0065 0.0036 0.0028 0.0023 0.0021 0.0018 0.0013 0.0013 0.0010 0.0009 0.0008 0.0007 0.0007 0.0005 0.0004 0.0004 0.0003 0.0003 0.0002 0.0002 0.0001 0.0001 0.0001 -0.0000
Automagically detecting the matrix size also works, but is not documented:
fid = fopen('example.txt');
out = textscan(fid,'','EndOfLine',']','Whitespace',' \b\t\r\n[', 'CollectOutput',true);
fclose(fid);
mat = out{1}
mat = 250×25
0.0164 0.0141 0.0099 0.0085 0.0081 0.0064 0.0062 0.0054 0.0045 0.0043 0.0041 0.0035 0.0027 0.0027 0.0024 0.0023 0.0020 0.0018 0.0017 0.0015 0.0014 0.0014 0.0013 0.0011 0.0009 0.0124 0.0096 0.0066 0.0048 0.0041 0.0037 0.0035 0.0032 0.0029 0.0025 0.0023 0.0021 0.0018 0.0016 0.0014 0.0012 0.0011 0.0010 0.0009 0.0009 0.0008 0.0008 0.0007 0.0006 0.0006 0.0115 0.0083 0.0057 0.0044 0.0037 0.0031 0.0028 0.0025 0.0022 0.0020 0.0018 0.0014 0.0012 0.0011 0.0010 0.0008 0.0007 0.0007 0.0007 0.0006 0.0006 0.0005 0.0005 0.0005 0.0004 0.0107 0.0074 0.0047 0.0038 0.0032 0.0026 0.0023 0.0020 0.0017 0.0014 0.0013 0.0009 0.0008 0.0007 0.0006 0.0006 0.0005 0.0005 0.0005 0.0004 0.0004 0.0003 0.0003 0.0003 0.0002 0.0103 0.0073 0.0041 0.0034 0.0030 0.0026 0.0023 0.0019 0.0016 0.0014 0.0011 0.0008 0.0007 0.0006 0.0006 0.0005 0.0004 0.0004 0.0003 0.0003 0.0003 0.0003 0.0002 0.0002 0.0002 0.0103 0.0072 0.0039 0.0032 0.0028 0.0026 0.0022 0.0019 0.0016 0.0014 0.0010 0.0008 0.0008 0.0006 0.0006 0.0005 0.0004 0.0004 0.0003 0.0003 0.0003 0.0002 0.0002 0.0002 0.0001 0.0105 0.0071 0.0037 0.0027 0.0026 0.0023 0.0021 0.0018 0.0015 0.0013 0.0010 0.0008 0.0008 0.0006 0.0005 0.0005 0.0004 0.0003 0.0003 0.0003 0.0002 0.0002 0.0002 0.0002 0.0001 0.0108 0.0070 0.0035 0.0026 0.0024 0.0022 0.0019 0.0016 0.0013 0.0012 0.0009 0.0008 0.0007 0.0006 0.0005 0.0004 0.0004 0.0003 0.0003 0.0003 0.0002 0.0002 0.0002 0.0001 0.0001 0.0110 0.0068 0.0035 0.0026 0.0023 0.0022 0.0018 0.0014 0.0012 0.0010 0.0009 0.0008 0.0007 0.0006 0.0005 0.0004 0.0004 0.0003 0.0003 0.0002 0.0002 0.0002 0.0001 0.0001 -0.0000 0.0111 0.0065 0.0036 0.0028 0.0023 0.0021 0.0018 0.0013 0.0013 0.0010 0.0009 0.0008 0.0007 0.0007 0.0005 0.0004 0.0004 0.0003 0.0003 0.0002 0.0002 0.0001 0.0001 0.0001 -0.0000
Avoid unnecessary complexity in your code.
  2 comentarios
J T
J T el 11 de Mayo de 2023
This is amazing! Also works in r2020a too! Thank you!
dpb
dpb el 11 de Mayo de 2023
Good thinking to use the closing bracket as newline @Stephen23; that didn't occur to me in initial response to Walter's counted attempt that fails because the count changes; hence the text processing...

Iniciar sesión para comentar.

Más respuestas (3)

Walter Roberson
Walter Roberson el 10 de Mayo de 2023
If it is stored in a file and it is always exactly 25 entries per logical row, then you could use textscan,
PerRow = 25;
fmt = "[" + repmat('%f', 1, PerRow) + "]";
FID = fopen(FILENAME, 'r');
output = cell2mat( textscan(FID, fmt) );
fclose(FID)
  3 comentarios
dpb
dpb el 10 de Mayo de 2023
As requested, attach a section of the text file in a usable format, not as a zipped file..."help us help you!"
J T
J T el 10 de Mayo de 2023
@dpb Hi, it doesn't allow me to upload the raw file .dat, that's why I zipped it. I am going to convert it to .txt and give it a try as well.

Iniciar sesión para comentar.


dpb
dpb el 11 de Mayo de 2023
Editada: dpb el 11 de Mayo de 2023
The '%g' format has struck again -- that's what killed @Walter Roberson's approach. While not the most efficient, a simple way in MATLAB would be
f=readlines('example.txt'); % import as string array
f=strrep(f,"[",""); % remove the brackets
f=strrep(f,"]",""); % remove the brackets
f=join(f); % turn into long string
f=strtrim(split(f)); % convert to array
f=f(strlength(f)>0);
data=str2double(strtrim(split(f))); % convert
whos data
Name Size Bytes Class Attributes data 6250x1 50000 double
data=reshape(data,[],25).';
data(1:3,:)
ans = 3×250
0.0164 0.0141 0.0099 0.0085 0.0081 0.0064 0.0062 0.0054 0.0045 0.0043 0.0041 0.0035 0.0027 0.0027 0.0024 0.0023 0.0020 0.0018 0.0017 0.0015 0.0014 0.0014 0.0013 0.0011 0.0009 0.0124 0.0096 0.0066 0.0048 0.0041 0.0111 0.0063 0.0038 0.0029 0.0023 0.0022 0.0018 0.0014 0.0013 0.0010 0.0009 0.0007 0.0007 0.0006 0.0006 0.0004 0.0004 0.0003 0.0002 0.0002 0.0002 0.0001 0.0001 0.0000 0.0000 0.0110 0.0062 0.0039 0.0031 0.0024 0.0088 0.0071 0.0062 0.0051 0.0047 0.0040 0.0033 0.0028 0.0023 0.0020 0.0016 0.0015 0.0011 0.0011 0.0009 0.0008 0.0007 0.0007 0.0006 0.0005 0.0005 0.0004 0.0004 0.0004 0.0003 0.0084 0.0071 0.0064 0.0054 0.0050
Alternatively,
f=readlines('example.txt'); % import as string array
f=split(join(f),']'); % turn into array by section
f=f(strlength(f)>0);
f=strtrim(f);
f=extractAfter(f,"[");
f=f(strlength(f)>0);
data=cell2mat(arrayfun(@(l)str2double(split(strtrim(l))).',f,'uni',0));
ans = 2×1 string array
"0.01643466 0.014102 0.00989389 0.00854453 0.00811339 0.00641578 0.00615053 0.00540413 0.00452342 0.00427268 0.0041174 0.00352849 0.00273467 0.00265508 0.00239323 0.00225965 0.00199268 0.00180934 0.00174052 0.00154865 0.00143824 0.00140056 0.00130063 0.00111959 0.00085831" "0.00837889 0.00778493 0.00703359 0.00646615 0.00562914 0.00480321 0.00431015 0.00361664 0.00308546 0.00267183 0.0022049 0.00195752 0.00153126 0.00126947 0.00105491 0.00103345 0.0009544 0.00088708 0.00083375 0.000798 0.00070615 0.00065023 0.00062888 0.00053734 0.00047918"
[data(1:3,:);data(end-3:end,:)]
Name Size Bytes Class Attributes data 250x25 50000 double
ans = 7×25
0.0164 0.0141 0.0099 0.0085 0.0081 0.0064 0.0062 0.0054 0.0045 0.0043 0.0041 0.0035 0.0027 0.0027 0.0024 0.0023 0.0020 0.0018 0.0017 0.0015 0.0014 0.0014 0.0013 0.0011 0.0009 0.0124 0.0096 0.0066 0.0048 0.0041 0.0037 0.0035 0.0032 0.0029 0.0025 0.0023 0.0021 0.0018 0.0016 0.0014 0.0012 0.0011 0.0010 0.0009 0.0009 0.0008 0.0008 0.0007 0.0006 0.0006 0.0115 0.0083 0.0057 0.0044 0.0037 0.0031 0.0028 0.0025 0.0022 0.0020 0.0018 0.0014 0.0012 0.0011 0.0010 0.0008 0.0007 0.0007 0.0007 0.0006 0.0006 0.0005 0.0005 0.0005 0.0004 0.0084 0.0078 0.0070 0.0065 0.0056 0.0048 0.0043 0.0036 0.0031 0.0027 0.0022 0.0020 0.0015 0.0013 0.0011 0.0010 0.0010 0.0009 0.0008 0.0008 0.0007 0.0006 0.0006 0.0005 0.0005 0.0084 0.0078 0.0070 0.0065 0.0056 0.0048 0.0043 0.0036 0.0031 0.0027 0.0022 0.0020 0.0015 0.0013 0.0011 0.0010 0.0010 0.0009 0.0008 0.0008 0.0007 0.0006 0.0006 0.0005 0.0005 0.0084 0.0078 0.0070 0.0065 0.0056 0.0048 0.0043 0.0036 0.0031 0.0027 0.0022 0.0020 0.0015 0.0013 0.0011 0.0010 0.0010 0.0009 0.0008 0.0008 0.0007 0.0006 0.0006 0.0005 0.0005 0.0084 0.0078 0.0070 0.0065 0.0056 0.0048 0.0043 0.0036 0.0031 0.0027 0.0022 0.0020 0.0015 0.0013 0.0011 0.0010 0.0010 0.0009 0.0008 0.0008 0.0007 0.0007 0.0006 0.0005 0.0005
  2 comentarios
J T
J T el 11 de Mayo de 2023
Hi, thank you so much for your inputs! However, I cannot run any other versions of matlab except for r2020a, and the readlines function doesn't seem to be available here.
dpb
dpb el 11 de Mayo de 2023
f=textread('example.txt','%s','delimiter','\n','whitespace','');
f=string(strtrim(f));
then. textread has been deprecated, but it's often still of real use/value where textscan is more trouble to deal with...

Iniciar sesión para comentar.


J T
J T el 11 de Mayo de 2023
Based on @dpb's and @Walter Roberson's answers, I worked out the following codes that is valid for R2020a:
FID = fopen('example.txt');
data = textscan(FID,'%s');
fclose(FID);
stringData = string(data{:}); % import as string array
f=strrep(stringData,"[",""); % remove the brackets
f=strrep(f,"]",""); % remove the brackets
f=join(f);% turn into long string
f=strtrim(split(f));% convert to array
f=f(strlength(f)>0);
data=str2double(strtrim(split(f))); % convert
data=reshape(data,[],length(data)/25)';
data(1:3,:)

Categorías

Más información sobre Data Import and Export en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by