Find faster way than compose to format table

10 visualizaciones (últimos 30 días)
Catriona Fyffe
Catriona Fyffe el 9 de Abr. de 2024
Respondida: Catriona Fyffe el 28 de Abr. de 2024
I have a large table of data which needs to be output as a csv and then read into a model. For some reason (which I cannot solve) the model will not take in values with scientific number formating (e.g. very small or large numbers, so 8.8817e-04, rather than 0.000888). To get around this I turn the table into text and use compose, but this is slow as the data are large. I have tried using round, which is fast, but Matlab still displays in scientific notation for some values and these remain in the csv, which does not solve the problem. Any faster ideas welcome!
Here's an example. Note I know you don't need all the DateTime info, but it gives the form of the data. In reality I have 30 years of data at least, which is why it is a problem. It also need to use a seperate workstation or HPC as the string/compose lines lead to out of memory problems on my own laptop, even if it can store this size of table ok.
Many thanks!
StartDate=datetime('01/01/2019 00:00','InputFormat','dd/MM/yyyy HH:mm');
EndDate=datetime('31/10/2020 23:00','InputFormat','dd/MM/yyyy HH:mm');
DateTime = StartDate:hours(1):EndDate;
DateTime = DateTime';
datasize =size(DateTime,1);
No_St = 307;
Data = rand(datasize,No_St);
Test_table = array2table(Data);
Test_table.Time = DateTime;
Test_table = movevars(Test_table,"Time","Before",1);
Table_str = string(Test_table{:,2:end}); %convert to string
Table_str = compose('%.6f',Table_str); %Reduce precision
  3 comentarios
Catriona Fyffe
Catriona Fyffe el 10 de Abr. de 2024
Hi Voss, yes I write the whole table into the file (after this code I put the Data back in to the table).
Catriona Fyffe
Catriona Fyffe el 10 de Abr. de 2024
I use normally:
Table2 = table(Test_table.Time);
Table2{:,2:end}=Table_str;
writetable(Table2,'Table_2_out.csv','delimiter',';');

Iniciar sesión para comentar.

Respuesta aceptada

Athanasios Paraskevopoulos
Athanasios Paraskevopoulos el 9 de Abr. de 2024
Here’s an approach you can try:
1. Instead of converting the whole table to strings at once, process the data in smaller chunks. This can help manage memory usage more effectively.
2. Use MATLAB’s `fprintf` function to write directly to a CSV file in the desired format. This avoids creating large temporary string arrays.
Here’s an example of how you can implement this:
matlab
StartDate = datetime('01/01/2019 00:00', 'InputFormat', 'dd/MM/yyyy HH:mm');
EndDate = datetime('31/10/2020 23:00', 'InputFormat', 'dd/MM/yyyy HH:mm');
DateTime = StartDate:hours(1):EndDate;
DateTime = DateTime';
datasize = size(DateTime, 1);
No_St = 307;
Data = rand(datasize, No_St);
Test_table = array2table(Data);
Test_table.Time = DateTime;
Test_table = movevars(Test_table, "Time", "Before", 1);
% Define chunk size
chunkSize = 10000; % Adjust this value based on available memory
% Open file for writing
fid = fopen('output.csv', 'w');
% Write header
fprintf(fid, 'Time,');
fprintf(fid, '%s,', string(Test_table.Properties.VariableNames(2:end-1)));
fprintf(fid, '%s\n', string(Test_table.Properties.VariableNames(end)));
% Write data in chunks
for i = 1:chunkSize:height(Test_table)
endIndex = min(i + chunkSize - 1, height(Test_table));
chunk = Test_table(i:endIndex, :);
% Write each row
for j = 1:size(chunk, 1)
fprintf(fid, '%s,', datestr(chunk.Time(j), 'dd/mm/yyyy HH:MM'));
fprintf(fid, '%.6f,', chunk{j, 2:end-1});
fprintf(fid, '%.6f\n', chunk{j, end});
end
end
% Close file
fclose(fid);
This script processes the table in chunks and writes directly to a CSV file, formatting the numeric values to avoid scientific notation. The `chunkSize` variable can be adjusted based on your system’s memory capacity to optimize performance.
I hope that will help you
  2 comentarios
Catriona Fyffe
Catriona Fyffe el 10 de Abr. de 2024
Thanks so much for your help!
I like the idea of spliting it into chunks and using fprintf is something I have not thought of. It is maybe a bit slow to write every row, but I can simplify this.
Stephen23
Stephen23 el 10 de Abr. de 2024
" It is maybe a bit slow to write every row, but I can simplify this."
You do not need to write each row in a loop, Voss showed how to use FPRINTF effectively.
Avoid deprecated DATESTR.

Iniciar sesión para comentar.

Más respuestas (2)

Catriona Fyffe
Catriona Fyffe el 28 de Abr. de 2024
Hi Everyone,
I worked to combine Athanasios and Voss' answers together to get exactly what I needed: the full table exported with the DateTime as well as the data, semi-colon seperated, and as fast as possible. To be fair I was not completely clear in the questions that I wanted the DateTime included in the output. I thought I would include the result in case it is useful to others. It is actually not so straightforward to use fprintf to do this (at least I thought not!) since to add the DateTime and data in as a block (not a row at a time) you need to combine them, and fprintf will not take cell arrays, but turning everything into text together using string and num2str works and is quite fast. As you can see the fprintf method is 22 times faster than the string+compose method. Its strange there is such a difference, but there you go!
%% NEW
tic
fid = fopen('test_matrix.csv','w');
fprintf(fid, '%s;','Time');
fprintf(fid, '%s;', string(Test_table.Properties.VariableNames(2:end-1))); %The comma also puts to new column
fprintf(fid, '%s\r\n', string(Test_table.Properties.VariableNames(end))); %\r\n puts to a new line
M = strcat(string(Test_table.Time,'dd.MM.yyyy HH:mm'),{';'}, num2str(Test_table{:,2:end},' %.4f;'));
fprintf(fid,'%s\r\n',M);
fclose(fid);
toc
Elapsed time is 5.315610 seconds.
%% OLD
tic
Table_str = string(Test_table{:,2:end}); %convert to string
Table_str = compose('%.4f',Table_str); %Reduce precision
Table2 = table(Test_table.Time);
Table2{:,2:No_St+1}=Table_str;
Table2.Properties.VariableNames = Test_table.Properties.VariableNames;
Table2.Time.Format='dd.MM.yyyy HH:mm';
writetable(Table2,'Table_2_out.csv','delimiter',';');
toc
Elapsed time is 114.677375 seconds.

Voss
Voss el 9 de Abr. de 2024
Rather than trying to find a faster alternative to compose to convert table data to appropriately formatting strings, the real problem you want to solve is: How to write a csv file with appropriately formatted numbers. Right?
Your code doesn't show how you are writing the file or what you are writing to it (i.e., does it include the table variable names? does it include the datetime column?), so I'll assume you only need the data in Test_table{:,2:end} to be written to the file.
% Creating the table
StartDate=datetime('01/01/2019 00:00','InputFormat','dd/MM/yyyy HH:mm');
EndDate=datetime('31/10/2020 23:00','InputFormat','dd/MM/yyyy HH:mm');
DateTime = StartDate:hours(1):EndDate;
DateTime = DateTime';
datasize =size(DateTime,1);
No_St = 307;
Data = rand(datasize,No_St);
Data(:,1) = Data(:,1)*1e-6; % make some data small, for testing
Test_table = array2table(Data);
Test_table.Time = DateTime;
Test_table = movevars(Test_table,"Time","Before",1);
Here's one thing you can try, which is to use fprintf with the desired precision:
tic
M = Test_table{:,2:end}.';
fmt = [repmat('%.6f,',1,size(M,1)) '\n'];
fid = fopen('test_matrix.csv','w');
fprintf(fid,fmt,M);
fclose(fid);
toc
Elapsed time is 1.818894 seconds.
For timing comparison, using compose and writecell:
tic
Table_str = compose('%.6f',Test_table{:,2:end}); %Reduce precision
writecell(Table_str,'test_cell.csv');
toc
Elapsed time is 8.790608 seconds.
Note that if you open these files in Excel, some data may show up in scientific notation. It's not really in scientific notation; that's just Excel doing its thing. To see the actual data as it really is stored in the file, use another program to view it, such as Notepad++.
  2 comentarios
Catriona Fyffe
Catriona Fyffe el 10 de Abr. de 2024
Again, thanks for your help! Using fprintf is so much faster and not something I usually use. I do need to save the whole table, but I will combine a bit your answer with Athanasios' and look at the fpintf documentation to figure it out.
Catriona Fyffe
Catriona Fyffe el 10 de Abr. de 2024
I would accept you both to answer this, but I just chose Athanasios for mentioning fprintf first, but both answers were very useful :)

Iniciar sesión para comentar.

Categorías

Más información sobre Structures en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by