Using accumarray to organize large CSV

Question

Emily el 21 de Sept. de 2023

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/2024302-using-accumarray-to-organize-large-csv

Comentada: dpb el 22 de Sept. de 2023

data1.csv

I have a CSV with 4 columns, containing several thousands of rows of data. I have attached a sample data sheet with a small amount of data as an example.

I'm trying to write a code to import the data in a way that will be easy to create scatter plots/visualize the data, as well as calculate some basic stats (mean, median, std. dev., etc).

I want to group the data based on TWO of the columns: SITE NUMBER and DEPTH

The other two columns, d18O and date are what I need to plot/visualize

Essentially, my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH

I started using accumarray to accomplish this but I am having issues with the arrays, they are not showing up with all of the unique rows of data

T = readtable("data1.csv", 'VariableNamingRule','preserve');
[G,ID] = findgroups(T.("Site number"));                                            
data = accumarray(G, T{:,4},[], @(x){T{x,:}});

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

dpb el 22 de Sept. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/2024302-using-accumarray-to-organize-large-csv#answer_1315552

Editada: dpb el 22 de Sept. de 2023

Abrir en MATLAB Online

data1.csv

tT=readtable("data1.csv");
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
tT.Properties.VariableNames(2)={'Site'};    % shorten to be more convenient to use
head(tT)
    d18O     Site    Date     Depth
    _____    ____    _____    _____

     -1.3    1004    44204     20  
     -1.9    1004    44263     20  
     -1.2    1004    44295     20  
     -1.7    1004    44567     20  
     -1.5    1004    44235     20  
     -9.3    1009    44323     50  
    -7.72    1009    44352     50  
    -7.07    1009    44445     50  
G=grpstats(tT,{'Site','Depth'},{'mean','median','std'},'DataVars',{'Date','d18O'})
G = 6×9 table
               Site    Depth    GroupCount    mean_Date    median_Date    std_Date    mean_d18O    median_d18O    std_d18O
               ____    _____    __________    _________    ___________    ________    _________    ___________    ________

    1004_20    1004     20           5          44313         44263        146.03        -1.52         -1.5       0.28636 
    1004_80    1004     80           6          44404         44489        339.01      -8.5533       -8.695       0.30137 
    1009_50    1009     50          11          44441         44412        158.55      -8.0855        -8.22        1.0068 
    1012_40    1012     40           3          44381         44263        235.02        -5.81        -4.38        2.6254 
    1014_20    1014     20           5          43936         43984        142.16       -6.326        -6.22       0.38546 
    1023_20    1023     20           2          44067         44067        237.59        -4.39        -4.39       0.39598 

rowfun

hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O'},'OutputFormat','uniform');

function h=doit(x,y)

figure

h=scatter(x,y,'filled');

xlabel('Date'),ylabel('d18O')

end

2 comentarios
Mostrar NingunoOcultar Ninguno

Emily el 22 de Sept. de 2023

this produces what I need, thank you!

dpb el 22 de Sept. de 2023

Abrir en MATLAB Online

data1.csv

NOTA BENE that you can get much more creative with the user function...

tT=readtable("data1.csv");

Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.

tT.Properties.VariableNames(2)={'Site'}; % shorten to be more convenient to use

hSc=rowfun(@doit,tT,'GroupingVariables',{'Site','Depth'},'InputVariables',{'Date','d18O','Site','Depth'},'OutputFormat','uniform');

function h=doit(x,y,s,d)

figure

h=scatter(x,y,'filled');

xlabel('Date'),ylabel('d18O')

legend(compose('Site %3d Depth %3d',s(1),d(1)))

end

Iniciar sesión para comentar.

Answer 2

Voss el 21 de Sept. de 2023

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/2024302-using-accumarray-to-organize-large-csv#answer_1315547

Editada: Voss el 21 de Sept. de 2023

Abrir en MATLAB Online

data1.csv

"my goal is to have separate variables or columns in a table that show d18O and DATE for each SITE NUMBER and DEPTH"

Something like this?

T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
    d18O     Site number    Date     Depth
    _____    ___________    _____    _____

     -1.3       1004        44204     20  
     -1.9       1004        44263     20  
     -1.2       1004        44295     20  
     -1.7       1004        44567     20  
     -1.5       1004        44235     20  
     -9.3       1009        44323     50  
    -7.72       1009        44352     50  
    -7.07       1009        44445     50  
    -9.88       1009        44412     50  
    -6.79       1009        44264     50  
    -6.58       1009        44538     50  
    -8.51       1009        44748     50  
    -8.34       1009        44652     50  
    -8.31       1009        44385     50  
    -8.22       1009        44235     50  
    -8.22       1009        44501     50  
    -4.38       1012        44263     40  
    -4.21       1012        44229     40  
    -8.84       1012        44652     40  
    -6.87       1014        43984     20  
    -6.54       1014        43803     20  
    -6.22       1014        44102     20  
    -6.12       1014        43773     20  
    -5.88       1014        44018     20  
    -4.67       1023        44235     20  
    -4.11       1023        43899     20  
    -8.83       1004        44593     80  
    -8.74       1004        44385     80  
    -8.75       1004        44685     80  
    -8.65       1004        44719     80  
    -8.24       1004        44204     80  
    -8.11       1004        43838     80  
gvars = ["Site number","Depth"];
vars = setdiff(T.Properties.VariableNames,gvars);
T_summary = groupsummary(T,gvars,@(x){x});
T_summary = removevars(T_summary,"GroupCount");
T_summary = renamevars(T_summary,"fun1_"+vars,vars)
T_summary = 6×4 table
    Site number    Depth        d18O             Date     
    ___________    _____    _____________    _____________

       1004         20      { 5×1 double}    { 5×1 double}
       1004         80      { 6×1 double}    { 6×1 double}
       1009         50      {11×1 double}    {11×1 double}
       1012         40      { 3×1 double}    { 3×1 double}
       1014         20      { 5×1 double}    { 5×1 double}
       1023         20      { 2×1 double}    { 2×1 double}
T_summary.Date{1}
ans = 5×1
       44204
       44263
       44295
       44567
       44235
T_summary.d18O{1}
ans = 5×1
   -1.3000
   -1.9000
   -1.2000
   -1.7000
   -1.5000

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Voss el 22 de Sept. de 2023

Abrir en MATLAB Online

data1.csv

Or maybe it's more convenient just to sort the table so that rows with the same Site number and Depth are together:

T = readtable("data1.csv", 'VariableNamingRule','preserve');
disp(T);
    d18O     Site number    Date     Depth
    _____    ___________    _____    _____

     -1.3       1004        44204     20  
     -1.9       1004        44263     20  
     -1.2       1004        44295     20  
     -1.7       1004        44567     20  
     -1.5       1004        44235     20  
     -9.3       1009        44323     50  
    -7.72       1009        44352     50  
    -7.07       1009        44445     50  
    -9.88       1009        44412     50  
    -6.79       1009        44264     50  
    -6.58       1009        44538     50  
    -8.51       1009        44748     50  
    -8.34       1009        44652     50  
    -8.31       1009        44385     50  
    -8.22       1009        44235     50  
    -8.22       1009        44501     50  
    -4.38       1012        44263     40  
    -4.21       1012        44229     40  
    -8.84       1012        44652     40  
    -6.87       1014        43984     20  
    -6.54       1014        43803     20  
    -6.22       1014        44102     20  
    -6.12       1014        43773     20  
    -5.88       1014        44018     20  
    -4.67       1023        44235     20  
    -4.11       1023        43899     20  
    -8.83       1004        44593     80  
    -8.74       1004        44385     80  
    -8.75       1004        44685     80  
    -8.65       1004        44719     80  
    -8.24       1004        44204     80  
    -8.11       1004        43838     80  
T_sorted = sortrows(T,["Site number","Depth"]);
disp(T_sorted);
    d18O     Site number    Date     Depth
    _____    ___________    _____    _____

     -1.3       1004        44204     20  
     -1.9       1004        44263     20  
     -1.2       1004        44295     20  
     -1.7       1004        44567     20  
     -1.5       1004        44235     20  
    -8.83       1004        44593     80  
    -8.74       1004        44385     80  
    -8.75       1004        44685     80  
    -8.65       1004        44719     80  
    -8.24       1004        44204     80  
    -8.11       1004        43838     80  
     -9.3       1009        44323     50  
    -7.72       1009        44352     50  
    -7.07       1009        44445     50  
    -9.88       1009        44412     50  
    -6.79       1009        44264     50  
    -6.58       1009        44538     50  
    -8.51       1009        44748     50  
    -8.34       1009        44652     50  
    -8.31       1009        44385     50  
    -8.22       1009        44235     50  
    -8.22       1009        44501     50  
    -4.38       1012        44263     40  
    -4.21       1012        44229     40  
    -8.84       1012        44652     40  
    -6.87       1014        43984     20  
    -6.54       1014        43803     20  
    -6.22       1014        44102     20  
    -6.12       1014        43773     20  
    -5.88       1014        44018     20  
    -4.67       1023        44235     20  
    -4.11       1023        43899     20  

Iniciar sesión para comentar.

Using accumarray to organize large CSV

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios
Mostrar NingunoOcultar Ninguno

Más respuestas (1)

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

Using accumarray to organize large CSV

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

2 comentarios Mostrar NingunoOcultar Ninguno

Más respuestas (1)

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

2 comentarios
Mostrar NingunoOcultar Ninguno

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos