How to segment dataset and randomly sample and append datapoints?
Mostrar comentarios más antiguos
I'm attempting to recreate an example from this paper. This is essentially segmented randomly sampling. This is similar to the idea of sampling across the entire dataset, but ensuring that each segment has an equal chance of being represented.
Assume there's a table:
T(:,1) = [3.0, 5.6, 10.2, 12.0, 14.4, 15.6];
T(:,2) = ["08-Feb-2019 12:34:52", "11-Feb-2019 16:07:17", "16-Feb-2019 14:50:31", "20-Feb-2019 05:43:51", "25-Feb-2019 07:55:24", "02-Mar-2019 11:06:27"];
The table is divided into s=3 segments, resulting the following divisions:
Seg1(:,1) = [3.0, 5.6];
Seg1(:,2) = ["08-Feb-2019 12:34:52", "11-Feb-2019 16:07:17"];
Seg2(:,1) = [10.2, 12.0];
Seg2(:,2) = ["16-Feb-2019 14:50:31", "20-Feb-2019 05:43:51"];
Seg3(:,1) = [14.4, 15.6];
Seg3(:,2) = ["25-Feb-2019 07:55:24", "02-Mar-2019 11:06:27"];
I need to randomly sample n-1 out of n of the datapoints in each segment, repeating the random sampling nCr times from each segment (i.e., select 1 datapoint out of 2 from Seg1, repeat the random sampling 2C1 or 1 more time from Seg1. Repeat for each segment).
Then to create new datasets T, we append each randomly selected datapoint from each segment. This should result in (nCr)^s new datasets, or in this case (2C1)^3=8 new datasets. An example of one dataset is:
T1(:,1) = [3.0, 10.2, 14.4];
T1(:,2) = ["08-Feb-2019 12:34:52","16-Feb-2019 14:50:31", "25-Feb-2019 07:55:24"];
This is my attempt to code the above.
numRows=size(T,1); %Establish total number of rows
numSeg = 3; % Split it into 3 segments
splitIndex = floor(numRows/numSeg); % Number of datapoints in each segment
% Splitting the table into numSeg segments with splitIndex number of datapoints
for m = 1:splitIndex:numRows
for i = length(numSeg)
Seg{i}= T(m);
end
end
% Randomly sample splitIndex-1 datapoints from each segment, repeat
% splitIndex choose splitIndex-1 times
for i = length(numSeg)
datasample(Seg{i})
T{i} = concat(Seg{i})
end
I'm especially struggling with randomly sampling from each segment and then matching the randomly sampled datapoints to append from each of the following segment. Thank you!
Respuestas (1)
Image Analyst
el 23 de Abr. de 2024
Try this:
% Create initial full table.
col1 = [3.0; 5.6; 10.2; 12.0; 14.4; 15.6];
col2 = {"08-Feb-2019 12:34:52"; "11-Feb-2019 16:07:17"; "16-Feb-2019 14:50:31"; "20-Feb-2019 05:43:51"; "25-Feb-2019 07:55:24"; "02-Mar-2019 11:06:27"};
T = table(col1, col2)
% Get a list of randomly chosen rows with none repeated and none missing.
numSeg = 3; % Split it into 3 segments
numRows = height(T);
splitIndex = floor(numRows/numSeg); % Number of datapoints in each segment
randomRows = randperm(numRows)
% Create 3 tables with random rows with no repeated rows.
t1 = T(randomRows(1:splitIndex), :)
t2 = T(randomRows(splitIndex+1: 2*splitIndex), :)
t3 = T(randomRows(2*splitIndex+1:3*splitIndex), :)
6 comentarios
Joy
el 24 de Abr. de 2024
Image Analyst
el 24 de Abr. de 2024
If you've already constructed Seg in the way you want and just want to pick one of them at random then you can get a random index and use that one.
randomIndex = randi(numel(Seg))
SegToUse = Seg(randomIndex)
Of course if you want to do it over and over again, you can put it in a loop. If you want to make sure you don't ever use the same one again, you can use randperm
randomIndexes = randperm(numel(Seg));
for k = 1 : whateverYouWant
SegToUse = Seg(randomIndexes(k))
end
And you might want to read this:
Joy
el 24 de Abr. de 2024
Image Analyst
el 24 de Abr. de 2024
Then try this:
col1 = [1; 2; 3; 4; 5; 6; 7; 8; 9];
col2 = {"08-Feb-2019 12:34:52"; "11-Feb-2019 16:07:17"; "16-Feb-2019 14:50:31"; "20-Feb-2019 05:43:51"; "25-Feb-2019 07:55:24"; "02-Mar-2019 11:06:27"; "04-Mar-2019 11:06:27"; "06-Mar-2019 11:06:27"; "08-Mar-2019 11:06:27"};
T = table(col1, col2)
% Get a list of randomly chosen rows with none repeated and none missing.
numSeg = 3; % Split it into 3 segments
numRows = height(T);
splitIndex = floor(numRows/numSeg) % Number of datapoints in each segment
% Create 3 tables with random rows with no repeated rows.
segments = cell(numSeg, 1); % Preallocate.
for k = 1 : numSeg
index1 = (k - 1) * splitIndex + 1
index2 = k * splitIndex
segments{k} = T(index1:index2, :);
end
% Now for each segment, get the table with the rows randomly ordered.
for k = 1 : numel(segments)
thisSegment = segments{k}; % thisSegment is a table, not a cell
% Get a random ordering of rows within this k'th segment.
randomRows = randperm(height(thisSegment));
% Reorder the segment according to that random ordering and
% put into a new cell array called randomlyOrderedSegments.
randomlyOrderedSegments{k} = thisSegment(randomRows, :)
end
Joy
el 24 de Abr. de 2024
Joy
el 25 de Abr. de 2024
Categorías
Más información sobre Image Segmentation en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!