What is the best way to save to .csv without losing precision and with column headers?

27 visualizaciones (últimos 30 días)
Hello-
tl;dr: What's the best way to save an m x n array from matlab to csv, with column headers that can be any text format (ie, include spaces and [ ]), without losing precision or clipping large values?
I am using Matlab (usually 2018a on Windows 7) to do various data manipulations on localizations from single molecule microscopy. These data tables are in large m x n arrays where each row is a molecule and each column is some characteristic about the molecule (for example, it's position in x y and z, its intensity, etc). These arrays are usually ~ 100,000 x 17 to >1,000,000 x 17.
In order to visualize the data and do some further processing I don't want to do in Matlab, I use another program called ThunderSTORM, a plugin for the image analysis suite FIJI. ThunderSTORM requires a .csv file with specific column headers that denote the measure and the units. Usually the headers are of the form, for example, "x [nm]", "y [nm]", "intensity [photons]", etc.
I had been using dlmwrite to write a header line to csv, then write the data underneath that, which works, except that a) dlmwrite truncates at 4 decimal plates, which is a problem, and b) dlmwrite for some reason is treating numbers greater than 9.9999 * 10^4 as just the base value without the exponent (so, 1.001 * 10^5 imports as 1.001 only). This is a problem for filtering the data as extreme values are either retained or removed.
When I used writetable instead of dlmwrite, I did not have this behavior and values >10^5 imported correctly, so it's not a ThunderSTORM thing. The only problem with using writetable is that the column names must follow the Matlab rules for variable names, so I cannot use "x [nm]" as a header. When ThunderSTORM doesn't get the column headers it expects, it sometimes tries to do mathematical conversions on that data, which I don't want (for example, it thinks the x values are in pixels, and tries to convert them to nm, when they are already in nm).
So my question is, what is the best way to save this data out to .csv, with the appropriate headers, without clipping large data (still not sure why dlmwrite is doing that) or losing precision? I was able to make a solution using fprintf, but it's clunky, and I'm wondering if there is a better way.
% what I want writetable to do, but it doesn't because the ColName variables are not allowed.
ColName = {'id' 'frame' 'x [nm]' 'y [nm]' 'z [nm]' 'sigma1 [nm]' 'sigma2 [nm]' 'intensity [photons]'...
'bkg [photons]' 'peakphot [photons]' 'crlbx [nm]' 'crlby [nm]' 'error [nm]' 'channel' 'area' 'firstrankarea' 'firstrankdensity'};
data = array2table(Tess561,'VariableNames',ColName);
writetable(data,'writetabletest.csv');
% my temporary solution that seems clunky, but does do what I want.
x = mydata(:,1:17);
fid = fopen('ouputfile.csv','w');
fprintf(fid,'id,frame,x [nm],y [nm],z [nm],sigma1 [nm],sigma2 [nm],intensity [photons],bkg [photons],peakphot [photons],CRLBX [nm],CRLBY [nm],uncertainty_xy [nm],channel,area,firstrankarea,firstrankdensity\n');
for i = 1:length(x)
fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x(i,1),x(i,2),x(i,3),x(i,4),x(i,5),x(i,6),x(i,7),x(i,8),x(i,9),x(i,10),x(i,11),x(i,12),x(i,13),x(i,14),x(i,15),x(i,16),x(i,17));
end
fclose(fid);

Respuesta aceptada

James Tursa
James Tursa el 28 de Ag. de 2019
Editada: James Tursa el 28 de Ag. de 2019
Why are you using fprintf in a loop? Can't you do it all in one call? E.g., this
for i = 1:length(x)
fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x(i,1),x(i,2),x(i,3),x(i,4),x(i,5),x(i,6),x(i,7),x(i,8),x(i,9),x(i,10),x(i,11),x(i,12),x(i,13),x(i,14),x(i,15),x(i,16),x(i,17));
end
becomes this (the transpose is to get the x elements in memory in the desired printing order)
fprintf(fid,'%f , %f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f ,%f\n',x.');
P.S., Since you are creating a text file instead of a binary file, you should probably be using 'wt' instead of 'w' in your fopen( ) call.
  4 comentarios
GolgiWhillikers
GolgiWhillikers el 1 de Sept. de 2019
Thanks
I ended up using fprintf with %f (actually what I ended up caring more about were the number of retained decimals places and it not clipping numbers larger than 5 significant digits). It works well.
James Tursa
James Tursa el 3 de Sept. de 2019
Note that you can specify the number of digits to print. E.g., %15.7f

Iniciar sesión para comentar.

Más respuestas (1)

Luna
Luna el 28 de Ag. de 2019
Hi,
Have you tried the functions writematrix or csvwrite ?
csvwrite is not recommended right now but you can try it.
  1 comentario
GolgiWhillikers
GolgiWhillikers el 28 de Ag. de 2019
I'll note that my for loop fprintf solution takes foreeevverrrr for the number of localizaitons I have (~2 million). Like minutes.
I had looked at writematrix, but it doesn't allow headers, and I can't use it in Matlab 2018, which I am stuck with at the moment. The problem with csvwrite is the same as baseline dlmwrite - it will truncate at 5 significant digits, which is I suppose what's happening with dlmwrite described above (ie, 100,000 becomes 1).
I guess I can specify some insane precision in dlmwrite, like precision 20 or something, since sometimes I have values that are like 4000.092583028582. Setting the precision super high does work, but does seem a little excessive. But if that's what I have to do, at least it does work (and is much faster than the loop, unsurprisingly).
ie
fid = fopen(savename,'w');
fprintf(fid,'%s\n',header);
fclose(fid);
dlmwrite(savename,mydatatosave,'-append','precision',20);

Iniciar sesión para comentar.

Categorías

Más información sobre Large Files and Big Data en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by