Rearranging rows side by side based on a column value

2 visualizaciones (últimos 30 días)
Sunny
Sunny el 7 de Feb. de 2019
Comentada: Sunny el 12 de Feb. de 2019
Hi,
I have an excel file with 300k samples (rows) and 40 columns. The first column is ID which has duplicate values and last column is about status and has binary values either 0 or 1.
I am looking to scan through this file and if the status column of first row is 0 it should copy the next row columns from 2 to 39 (excluising ID and status) and paste it where first row ends first row if the belong to same ID. This should happen for every other row with status 0 and it should copy only data related to same ID. Please see example below. From the expected output you can obvserve for ID 35 we didn't append anyt value for first sample as the status is 1 and for ID 35 the third sample is also not appended even if status is 0 as its the last row related to 35 and we cannot append ID 45 values,
ID Col1 Col2 Col3 Col4 Status
35 993 65 130 0 1
35 993 65 24 1 0
35 993 65 7 1 0
45 993 65 9 1 0
45 993 65 19 1 0
45 993 65 58 0 0
Expected Output:
ID Col1 Col2 Col3 Col 4 Status Col1 Col2 Col3 Col4
35 993 65 130 0 1
35 993 65 24 1 0 993 65 7 1
35 993 65 7 1 0
45 993 65 9 1 0 993 65 19 1
45 993 65 19 1 0 993 65 58 0
45 993 65 58 0 0
Thanks
  4 comentarios
Guillaume
Guillaume el 8 de Feb. de 2019
Oh, I didn't realise you wanted the filtered columns to be appended to the right of the same file. While it's perfectly doable, are you really sure you want this? I wouldn't think that repeated data and data with gaps in the rows is very practical? Wouldn't you rather have it as a separate file (with no gaps)?
As to the headers, it's up to you if you want them or not. It's a slightly different approach (table vs matrix) but the same amount of code either way.
Sunny
Sunny el 8 de Feb. de 2019
Editada: Sunny el 9 de Feb. de 2019
Guillaume thanks. i can have it as a seperate file . Your suggestion is correct as I would have deleted rows with gaps.

Iniciar sesión para comentar.

Respuesta aceptada

Guillaume
Guillaume el 11 de Feb. de 2019
I'm assuming that status 1 only happens once per ID. I'm also assuming that all rows of an ID are together.
t = readtable('Input_File.xlsx'); %read input data
[~, ~, subs] = unique(t.ID); %assign unique ID from 1 to n to each ID of table
hasstatus1 = accumarray(subs, t.Status, [], @any); %find IDs that have a status of 1
endrows = accumarray(subs, (1:height(t))', [], @max); %find last row of each ID
rowstodelete = t.Status == 1; %mark rows with status 1 for deletion
rowstodelete(endrows(hasstatus1)) = true; %and last row of ID which have a status of 1
From there, you can create a new table with only the rows and columns you want:
newtable = t(~rowstodelete, 2:end-1); %2:end-1 as you want to get rid of 1st and last column
writetable(newtable, 'NewFile.xlsx');
Or append to the existing table with gaps in row. This forces all appended columns to be cell arrays, which is more awkward:
newcontent = num2cell(t{:, 2:end-1});
newcontent(rowstodelete, :) = {[]};
newtable = [t, cell2table(newcontent, 'VariableNames', compose('%s_1', string(t.Properties.VariableNames(2:end-1))))];
writetable(newtable, 'NewFile2.xlsx');

Más respuestas (0)

Etiquetas

Productos


Versión

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by