combining similarly named variables

3 visualizaciones (últimos 30 días)
Corey McDowell
Corey McDowell el 29 de Jun. de 2022
Editada: Vatsal el 29 de Sept. de 2023
in a dataset I have variables that are functionally identical but have slightly different names due to being imported from different machines, one example is:
'chest_abd_pelvis_w_contrast_over_50kg' & 'cap_w_contrast_over_50kg'
When doing group analysis on these it is often better for them to be considered a single variable. I have been able to merge them 1 at a time using a regexp based method shown below
protocols = groupcounts(B,"Protocol");
protocols = sortrows(protocols,"GroupCount","descend")
idx1 = ~cellfun(@isempty,(regexp(protocols.Protocol(:),'(chest.*abd.*pel.*over.*50|cap.*w.*over.*50)')));
B.idx1 = ismember(B.Protocol,protocols.Protocol(idx1));
B.Protocol(B.idx1) = {'CAP w/ contrast over 50 kg'};
B{:,(~cellfun(@isempty,(strfind(B.Properties.VariableNames,'idx'))))} = []
The minor differences in names come in a variety of forms so I do not have much hope for being able to group all of them at once, however several of these have to be repeated several times, an example of this is that for the example above there is also a:
'chest_abd_pelvis_w_contrast_21_to_50kg' & 'cap_w_contrast_21_to_50kg'
I am asking to see if there is a way to merge the over the two over 50s together and the two 21-50s together simulataneously

Respuestas (1)

Vatsal
Vatsal el 21 de Sept. de 2023
Editada: Vatsal el 29 de Sept. de 2023
I understand that you have variables in the dataset that are functionally identical but have different variable names. Now when doing group analysis, you wanted to group these variables and consider them as a single variable and you also wanted to do the same for a different set of variables simultaneously.
If your task is to merge the two over 50 variables and the two 21-50 variables , and not merge all four of them, then you have two use two different “regexp”, one will merge the two over 50 variables and another “regexp” will merge the two 21-50 variables together.
I am also providing the updated code for the reference:
protocols = groupcounts(B, "Protocol");
protocols = sortrows(protocols, "GroupCount", "descend");
idx_over_50 = ~cellfun(@isempty, regexp(protocols.Protocol(:), '(chest.*abd.*pel.*over.*50|cap.*w.*over.*50)'));
B.idx_over_50 = ismember(B.Protocol, protocols.Protocol(idx_over_50));
B.Protocol(B.idx_over_50) = {'CAP w/ contrast over 50 kg'};
idx_21_to_50 = ~cellfun(@isempty, regexp(protocols.Protocol(:), '(chest.*abd.*pel.*21.*50|cap.*w.*21.*50)'));
B.idx_21_to_50 = ismember(B.Protocol, protocols.Protocol(idx_21_to_50));
B.Protocol(B.idx_21_to_50) = {'CAP w/ contrast 21 to 50 kg'};
B{:, (~cellfun(@isempty, (strfind(B.Properties.VariableNames, 'idx'))))} = [];
You can also refer to the MATLAB documentation for "regexp" to obtain more information on its usage and syntax. The link is provided below: -

Categorías

Más información sobre Numeric Types en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by