Using accumarray to organize large CSV
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
I have a CSV with 4 columns, containing several thousands of rows of data. I have attached a sample data sheet with a small amount of data as an example.
I'm trying to write a code to import the data in a way that will be easy to create scatter plots/visualize the data, as well as calculate some basic stats (mean, median, std. dev., etc).
I want to group the data based on TWO of the columns: SITE NUMBER and DEPTH
The other two columns, d18O and date are what I need to plot/visualize
Essentially, my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH
I started using accumarray to accomplish this but I am having issues with the arrays, they are not showing up with all of the unique rows of data
T = readtable("data1.csv", 'VariableNamingRule','preserve');
[G,ID] = findgroups(T.("Site number"));
data = accumarray(G, T{:,4},[], @(x){T{x,:}});
0 comentarios
Respuesta aceptada
dpb
el 22 de Sept. de 2023
Editada: dpb
el 22 de Sept. de 2023
tT=readtable("data1.csv");
tT.Properties.VariableNames(2)={'Site'}; % shorten to be more convenient to use
head(tT)
G=grpstats(tT,{'Site','Depth'},{'mean','median','std'},'DataVars',{'Date','d18O'})
hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O'},'OutputFormat','uniform');
function h=doit(x,y)
figure
h=scatter(x,y,'filled');
xlabel('Date'),ylabel('d18O')
end
2 comentarios
dpb
el 22 de Sept. de 2023
NOTA BENE that you can get much more creative with the user function...
tT=readtable("data1.csv");
tT.Properties.VariableNames(2)={'Site'}; % shorten to be more convenient to use
hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O','Site','Depth'},'OutputFormat','uniform');
function h=doit(x,y,s,d)
figure
h=scatter(x,y,'filled');
xlabel('Date'),ylabel('d18O')
legend(compose('Site %3d Depth %3d',s(1),d(1)))
end
Más respuestas (1)
Voss
el 21 de Sept. de 2023
Editada: Voss
el 21 de Sept. de 2023
"my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH"
Something like this?
T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
gvars = ["Site number","Depth"];
vars = setdiff(T.Properties.VariableNames,gvars);
T_summary = groupsummary(T,gvars,@(x){x});
T_summary = removevars(T_summary,"GroupCount");
T_summary = renamevars(T_summary,"fun1_"+vars,vars)
T_summary.Date{1}
T_summary.d18O{1}
1 comentario
Voss
el 22 de Sept. de 2023
Or maybe it's more convenient just to sort the table so that rows with the same Site number and Depth are together:
T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
T_sorted = sortrows(T,["Site number","Depth"]);
disp(T_sorted);
Ver también
Categorías
Más información sobre Data Distribution Plots en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!