How to conditionally and by groups subtract one row from another in table?

8 visualizaciones (últimos 30 días)
Hi
I want to add new rows in the table below. The new rows should be equal to 'Total energy supply' minus 'Total final consumption' for each country in the table.
Normally, I would use varfun() to find the sum of groups. But I can't figure out how to apply varfun() conditionally, or what do to when substracting instead of adding two rows.
Here is some code that reproduces the table:
country = categorical(["USA";"USA";"USA"; "Canada"; "Canada"; "Canada"]);
sector = categorical(["supply"; "consumption"; "other";"supply"; "consumption"; "other"]);
coal = [1;2;3;4;5;6];
oil = [1;2;3;4;5;6];
T = table(country, sector, coal, oil)
T = 6×4 table
country sector coal oil _______ ___________ ____ ___ USA supply 1 1 USA consumption 2 2 USA other 3 3 Canada supply 4 4 Canada consumption 5 5 Canada other 6 6
Any help is much appreciated.
Thanks.
  2 comentarios
Andreas Hoel-Holt
Andreas Hoel-Holt el 4 de Mayo de 2022
Editada: Cris LaPierre el 4 de Mayo de 2022
I would like the output to be like the table "Tres" below. It has a new row for each country with the value of supply minus demand for each fuel.
country = categorical(["USA";"USA";"USA"; "USA"; ...
"Canada"; "Canada"; "Canada"; "Canada"]);
sector = categorical(["supply"; "consumption"; "other"; "net supply"; ...
"supply"; "consumption"; "other"; "net supply"]);
coal = [7;8;9;-1;10;11;12;-1];
oil = [1;2;3;-1;4;5;6;-1];
Tres = table(country, sector, coal, oil)
Tres = 8×4 table
country sector coal oil _______ ___________ ____ ___ USA supply 7 1 USA consumption 8 2 USA other 9 3 USA net supply -1 -1 Canada supply 10 4 Canada consumption 11 5 Canada other 12 6 Canada net supply -1 -1
I have tried using unstack-stack to create these new variables, but I can't figure out how to keep the stacked variable names instead of numerical indices under "sector" in the result table "TS":
country = categorical(["USA";"USA";"USA"; ...
"Canada"; "Canada"; "Canada"]);
sector = categorical(["supply"; "consumption"; "other"; ...
"supply"; "consumption"; "other"]);
coal = [7;8;9;10;11;12];
oil = [1;2;3;4;5;6];
T = table(country, sector, coal, oil);
TU = unstack(T, ["coal" "oil"],"sector");
TU.coalNet = TU.coal_supply - TU.coal_consumption;
TU.oilNet = TU.oil_supply - TU.oil_consumption;
vars = {[2 3 4 8],[5 6 7 9]};
TS = stack(TU, vars, ...
"IndexVariableName", "sector", ...
'NewDataVariableName', {'coal', 'oil'})
TS = 8×4 table
country sector coal oil _______ ______ ____ ___ USA 2 8 2 USA 3 9 3 USA 4 7 1 USA 8 -1 -1 Canada 2 11 5 Canada 3 12 6 Canada 4 10 4 Canada 8 -1 -1
If someone knows how to keep the indices as strings (or categories), or how to this more elegantly, it is much appreciated.
Thank you.

Iniciar sesión para comentar.

Respuesta aceptada

Jon
Jon el 4 de Mayo de 2022
Is this what you want:
country = categorical(["USA";"USA";"USA"; "Canada"; "Canada"; "Canada"]);
sector = categorical(["supply"; "consumption"; "other";"supply"; "consumption"; "other"]);
coal = [1;2;3;4;5;6];
oil = [1;2;3;4;5;6];
T = table(country, sector, coal, oil)
% Find the available countries
countries = unique(T.country)
% Loop through the available countries computing the net consumption for
% each
endRow = size(T,1); % current ending row number
for k = 1:numel(countries)
% sum the supplies and consumption
idl = T.sector == "supply" & T.country == countries(k);
totalSupply = sum(T{idl,3:end},1);
idl = T.sector == "consumption" & T.country == countries(k);
totalConsumption = sum(T{idl,3:end},1);
% compute net
net = totalSupply -totalConsumption;
% and add row
endRow = endRow + 1;
newRow = T(find(idl),:); % base on any of the current matching rows
newRow.sector = "net";
newRow{1,3:end} = net;
T(endRow,:) = newRow;
end
T =
8×4 table
country sector coal oil
_______ ___________ ____ ___
USA supply 1 1
USA consumption 2 2
USA other 3 3
Canada supply 4 4
Canada consumption 5 5
Canada other 6 6
Canada net -1 -1
USA net -1 -1
  2 comentarios
Jon
Jon el 4 de Mayo de 2022
Glad it is working. I don't have any experience using categorical variables. So, maybe there is some other more elegant way to do this using their special functionality, but at least this is one approach.

Iniciar sesión para comentar.

Más respuestas (1)

Cris LaPierre
Cris LaPierre el 4 de Mayo de 2022
Editada: Cris LaPierre el 4 de Mayo de 2022
I think I would calculate the results to a new table using groupsummary, then concatenate the tables. This is a little longer than I was hoping, but figured I'd address the smaller issues. Once it's automated, it doesn't really matter.
country = categorical(["USA";"USA";"USA"; "Canada"; "Canada"; "Canada"]);
sector = categorical(["supply"; "consumption"; "other";"supply"; "consumption"; "other"]);
coal = [1;2;3;4;5;6];
oil = [1;2;3;4;5;6];
T = table(country, sector, coal, oil)
T = 6×4 table
country sector coal oil _______ ___________ ____ ___ USA supply 1 1 USA consumption 2 2 USA other 3 3 Canada supply 4 4 Canada consumption 5 5 Canada other 6 6
% Change consumption values to negative
newT = T;
ind = newT.sector == "consumption";
newT.coal(ind) = -newT.coal(ind);
newT.oil(ind) = -newT.oil(ind)
newT = 6×4 table
country sector coal oil _______ ___________ ____ ___ USA supply 1 1 USA consumption -2 -2 USA other 3 3 Canada supply 4 4 Canada consumption -5 -5 Canada other 6 6
% Remove 'other'
ind = newT.sector == "other";
newT(ind,:) = [];
% calculate net supply by country
net = groupsummary(newT,"country",'sum',["coal","oil"])
net = 2×4 table
country GroupCount sum_coal sum_oil _______ __________ ________ _______ Canada 2 -1 -1 USA 2 -1 -1
% format table
net = removevars(net,"GroupCount");
net.sector(:) = categorical("net supply");
net.Properties.VariableNames(2:3) = ["coal","oil"]
net = 2×4 table
country coal oil sector _______ ____ ___ __________ Canada -1 -1 net supply USA -1 -1 net supply
% combine tables
T = [T;net];
T = sortrows(T,"country")
T = 8×4 table
country sector coal oil _______ ___________ ____ ___ Canada supply 4 4 Canada consumption 5 5 Canada other 6 6 Canada net supply -1 -1 USA supply 1 1 USA consumption 2 2 USA other 3 3 USA net supply -1 -1
  2 comentarios
Jon
Jon el 4 de Mayo de 2022
@Cris LaPierre I liked learning from your example how groupsummary could be applied for this kind of problem. I can see that functionality can be quite powerful. Thanks!

Iniciar sesión para comentar.

Categorías

Más información sobre Data Type Conversion en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by