How do I assign rows of a variable to categories?

5 visualizaciones (últimos 30 días)
Maximilian Fenski
Maximilian Fenski el 20 de Abr. de 2022
Respondida: Vatsal el 29 de Sept. de 2023
Hello,
i have a table ("data") that consists of 4 variables (688 rows), this is how the upper 6 rows look like:
Pseudonym Indication Study-name Sequence
Patient_001 1 1 1
Patient_002 2 2 2
Patient_003 3 3 1
Patient_004 3 1 1
Patient_005 4 2 2
Patient_006 4 5 2
I want to find all groups defined by "Indication" "Study-name" "Sequence".
I created a new table: data1 = data(:,{'indication' 'study_name' 'sequence'}) and then used
[p,v] = findgroups(data1) to find all possible groups.
Now I want to assign each row in "Pseudonym" to one of these groups.
My goal is to create a new variable for every group, containing all Pseudonyms that belong to that group.
In the next step i want to randomly pick pseudonyms from each group.
Furthermore I would like to take the group-size (e.g. number of pseudonyms in one group) into consideration.
That means, that if I want to randomly pick 20 Patients from all categories and one group contains 50% of the data, then 10 patients should be picked out of this group.
could you please help me setting up the code!
Thank you so much!
Max

Respuestas (1)

Vatsal
Vatsal el 29 de Sept. de 2023
I understand that you have a table “data” which consists of four columns, and you want to find the groups based on the columns "Indication", "Study-name" and "Sequence". After finding the groups you want to assign each row in “Pseudonym” to one of these groups.
After this, it is required to randomly pick “x” number of “Pseudonym” from all groups, keeping the group size in consideration.
I am attaching the code below which will randomly pick the “Pseudonym” from all groups while considering the group-size:
data1 = data(:, {'Indication', 'Study-name', 'Sequence'});
[p, v] = findgroups(data1);
groups = splitapply(@(x) {x}, data.Pseudonym, p);
numPicks = 20; % Number of pseudonyms to pick in total
pickedPseudonyms = [];
totalPseudonyms = sum(cellfun(@numel, groups));
scalingFactor = numPicks / totalPseudonyms;
[~, sortedIndices] = sort(cellfun(@numel, groups), 'descend');
sortedGroups = groups(sortedIndices);
for i = 1:numel(sortedGroups)
groupSize = numel(sortedGroups{i});
picksFromGroup = round(groupSize * scalingFactor); % Adjust picks based on group size
if picksFromGroup > 0
randomIndices = randperm(groupSize, min(groupSize, picksFromGroup));
pickedPseudonyms = [pickedPseudonyms, sortedGroups{i}(randomIndices)];
end
% Break the loop if 20 pseudonyms are selected
if numel(pickedPseudonyms) >= numPicks
break;
end
end
You can also refer to the MATLAB documentation for "randperm" to obtain more information on its usage and syntax. The link is provided below: -
I hope this helps!

Categorías

Más información sobre Categorical Arrays en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by