How to create separate tables from a data set

34 visualizaciones (últimos 30 días)
Alexander Jakubiec
Alexander Jakubiec el 10 de Feb. de 2021
Comentada: dpb el 10 de Jun. de 2021
Working with gait analysis data, the exported raw file separates the data into four "tables" on top of each other (Joints, Model outputs, Segments and Trajectories).
When I use the readtable function, it identifies the variables from the top table (Joints) but not the other tables and instead has 5 rows of NaN separating the tables.
How do I get my code to identify and separate the data into easier to read tables.
(Attached is the file as a .txt, the file was too large to attach as .csv which is what I've been working with)
  3 comentarios
Alexander Jakubiec
Alexander Jakubiec el 10 de Feb. de 2021
Yeh you're right, that is much simpler. Thanks.
You can find it attached here
Image Analyst
Image Analyst el 10 de Feb. de 2021
Please explain what column numbers of that are supposed to be what table. And do you want an array of tables (one table for each patient) or do you want all patients in the same table with the patient ID as one of the columns (so there is just a single set of 4 tables)?

Iniciar sesión para comentar.

Respuesta aceptada

dpb
dpb el 10 de Feb. de 2021
Editada: dpb el 11 de Feb. de 2021
Boy! is that a mess! Why people/vendors do such things is beyond ken.
Another start...
G=readcell('Level Reduced.csv'); % bring in whole thing as cell array
SECTIONS={'Joints','Model','Segments','Trajectories'}; % the looked for data sections
ich=find(cellfun(@ischar,G(:,1))); % the records that have char() data first column
isec=ich(contains(G(ich,1),SECTIONS)); % the locations of each section beginning
isec=[isec;size(G,1)+2]; % add last line for indexing sections first:last lines
for i=1:numel(SECTIONS)
vn=string(G(isec(i)+3,:));
in=find(ismissing(vn));
vn(in)="Var"+in;
cmd=sprintf('t%s=cell2table(G(isec(%d)+5:isec(%d)-2,:))',SECTIONS{i},i,i+1);
eval(cmd)
end
The above results in four tables (I cut the size of the sample file down even further to just three records per group) named per the SECTIONS array; normally I would never use eval to "poof" variables into the workplace, but this seems to be the unusual case where it does make some sense. Code from here on would need to (and would presume would want to) use the explicit table names as it appears each group is different-enough as that code would not be duplicated.
tJoints =
3×25 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25
______ ____ _____ _____________ ______ _____ ______ _____ _____________ _____ ______ _____ ______ ______ _____ _____ _____ _____ _____ ______ _____ _____ ______ ______ _______
123.00 0.00 41.07 [1×1 missing] -23.61 -0.50 -52.06 11.76 [1×1 missing] 33.19 -23.97 26.34 -24.32 -19.67 0.78 -9.17 -4.21 3.50 23.78 -70.89 19.20 48.85 897.59 211.86 NaN
124.00 0.00 40.01 [1×1 missing] -23.56 -0.31 -51.82 11.77 [1×1 missing] 33.21 -23.97 26.64 -24.50 -19.55 0.99 -9.05 -4.14 3.77 23.71 -70.82 18.92 49.97 896.71 211.33 -106.03
125.00 0.00 38.65 [1×1 missing] -23.43 -0.10 -51.42 11.78 [1×1 missing] 33.20 -23.99 26.82 -24.56 -19.54 1.27 -8.98 -4.06 4.09 23.83 -70.81 18.81 50.99 896.06 210.57 -136.10
tModel =
3×25 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25
______ ____ ____ ____ ____ ____ _____ ____ _____________ _____________ _____________ _____________ _____________ _____________ _____________ _____________ _____________ _____ ______ ______ _____ ______ ______ ______ ______
123.00 0.00 8.94 0.00 0.00 8.97 -0.53 4.86 [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] 7.14 -10.27 -11.83 79.67 314.58 492.79 400.52 400.52
124.00 0.00 8.97 0.00 0.00 9.00 -0.47 4.71 [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] 7.11 -10.41 -11.56 79.93 314.35 493.12 400.61 400.61
125.00 0.00 8.93 0.00 0.00 8.96 -0.45 4.66 [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] [1×1 missing] 7.06 -10.53 -11.23 80.12 313.99 493.47 400.64 400.64
tSegments =
3×25 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25
______ ____ _____ _____ _____ ______ ______ ______ _____ ______ _____ _______ ______ ______ _____ _____ _____ _____ ______ ______ _____ ______ _____ _____ ______
123.00 0.00 59.21 54.50 77.89 -78.23 302.03 892.00 37.75 100.37 28.41 -122.44 343.55 135.89 89.18 85.27 58.57 41.15 322.18 500.28 37.75 100.37 28.41 79.86 342.85
124.00 0.00 58.96 54.49 78.23 -79.49 301.43 890.79 37.24 99.29 29.23 -114.01 343.27 131.59 88.12 84.58 59.54 41.39 322.36 499.57 37.24 99.29 29.23 89.46 344.06
125.00 0.00 58.88 54.47 78.42 -80.55 300.64 889.76 36.97 98.03 30.11 -105.09 343.29 126.97 87.06 83.63 60.52 40.94 322.21 498.77 36.97 98.03 30.11 99.54 345.12
tTrajectories =
3×25 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 Var10 Var11 Var12 Var13 Var14 Var15 Var16 Var17 Var18 Var19 Var20 Var21 Var22 Var23 Var24 Var25
______ ____ _____ ______ ______ _____ _____ ______ _______ ______ ______ _______ ______ ______ _____ ______ ______ _____ ______ ______ ______ ______ ______ ______ ______
123.00 0.00 50.81 359.27 970.07 42.66 46.04 946.57 -172.71 252.49 978.63 -173.44 164.97 976.32 56.60 392.62 730.53 92.55 379.65 502.24 -20.02 369.62 289.29 -75.38 361.01
124.00 0.00 50.13 358.90 970.59 42.10 45.82 946.35 -173.70 252.02 978.65 -174.31 164.49 976.29 55.85 392.45 730.88 92.50 379.48 502.53 -15.05 369.16 286.28 -67.52 361.31
125.00 0.00 49.65 358.41 971.07 41.52 45.60 946.12 -174.70 251.56 978.75 -175.16 164.03 976.30 55.03 392.30 731.20 92.30 379.21 502.81 -10.04 368.90 283.43 -59.52 361.58
>>
I tried to use the records beginning with 'Frame' as variable names, but that's fraught with issues -- and didn't have time to sort them all out...that's what the code for variable vn in the loop does; one would use
tName.Properties.VariableNames=vn;
% or
,'VariableNames',vn % add to argument list of |cell2table| for variable names
on creation, but
>> tTrajectories.Properties.VariableNames=vn;
Duplicate table variable name: 'X'.
>>
so one will have have some naming logic to build a unique set of names for each group of repeated names for each table. I didn't have the time to create that code at the moment.
The above code for vn does create a default VarN name for missing values that occur in at least the first section, but the duplicates was where I then had to resign from the fray somewhat bloodied but mostly just out of time to spare just now.
  3 comentarios
Alexander Jakubiec
Alexander Jakubiec el 10 de Jun. de 2021
Hi again,
The code you sent was brilliant, and worked wonders on the dataset i sent. However it seems to be struggling with this other dataset and I'm not sure why.
I'm getting the error
Index in position 1 exceeds array bounds (must not exceed 12895).
Error in Raw_Data_Split (line 8)
vn=string(G(isec(i)+3,:));
I figure its to do with the length of the tables they are creating as the structure of the dataset is the same the previous one. Whilst probably not ideal, I usually use the "try" function when the array bounds are exceeded, however in this instance it means only the first table (tJoints) is created.
Attached is the new dataset for reference. It has been reduced though.
dpb
dpb el 10 de Jun. de 2021
The file you attached ran to completion here...created the four expected tables.
Wherever the problem is, it must be in the part you didn't include in the posting.

Iniciar sesión para comentar.

Más respuestas (1)

Mathieu NOE
Mathieu NOE el 10 de Feb. de 2021
hello
here below my suggestion to split the large csv file into smaller txt files
they may be smarter use of regexp but this is what I can offer quick and dirty
seems you can now easily do a second loop to load these files with readtable and generate your multiples tables from there;
hope it helps, even if it can certainly be further refined
% Tell it what folder you want to put the files in
outdir = cd;
% Read the initial file in all at once
filename = 'Level Reduced.csv';
fid = fopen(filename, 'r');
data = fread(fid, '*char').';
fclose(fid);
% Break it into pieces based upon ',,,,,,,,,,,,,,,,,,,,,,,,' lines
pieces = regexp(data, '\n\s*\n', 'split');
[match,piece,startIndex,endIndex] = regexp(data,',,,,,,,,,,,,,,,,,,,,,,,,','match','split');
% Now loop through and save each one
n = 0;
for k = 1:numel(piece)
if size(piece{k},2) > 100 % so avoid storing empty files or single line (title)
n = n + 1;
filename = fullfile(outdir, ['out' num2str(n), '.txt']);
% Now write the piece to the file
fid = fopen(filename, 'w');
fwrite(fid, piece{k});
fclose(fid);
end
end
  1 comentario
Mathieu NOE
Mathieu NOE el 10 de Feb. de 2021
just noticed this line has to be removed :
pieces = regexp(data, '\n\s*\n', 'split');

Iniciar sesión para comentar.

Categorías

Más información sobre Data Type Conversion en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by