How to segment dataset and randomly sample and append datapoints?

I'm attempting to recreate an example from this paper. This is essentially segmented randomly sampling. This is similar to the idea of sampling across the entire dataset, but ensuring that each segment has an equal chance of being represented.
Assume there's a table:
T(:,1) = [3.0, 5.6, 10.2, 12.0, 14.4, 15.6];
T(:,2) = ["08-Feb-2019 12:34:52", "11-Feb-2019 16:07:17", "16-Feb-2019 14:50:31", "20-Feb-2019 05:43:51", "25-Feb-2019 07:55:24", "02-Mar-2019 11:06:27"];
The table is divided into s=3 segments, resulting the following divisions:
Seg1(:,1) = [3.0, 5.6];
Seg1(:,2) = ["08-Feb-2019 12:34:52", "11-Feb-2019 16:07:17"];
Seg2(:,1) = [10.2, 12.0];
Seg2(:,2) = ["16-Feb-2019 14:50:31", "20-Feb-2019 05:43:51"];
Seg3(:,1) = [14.4, 15.6];
Seg3(:,2) = ["25-Feb-2019 07:55:24", "02-Mar-2019 11:06:27"];
I need to randomly sample n-1 out of n of the datapoints in each segment, repeating the random sampling nCr times from each segment (i.e., select 1 datapoint out of 2 from Seg1, repeat the random sampling 2C1 or 1 more time from Seg1. Repeat for each segment).
Then to create new datasets T, we append each randomly selected datapoint from each segment. This should result in (nCr)^s new datasets, or in this case (2C1)^3=8 new datasets. An example of one dataset is:
T1(:,1) = [3.0, 10.2, 14.4];
T1(:,2) = ["08-Feb-2019 12:34:52","16-Feb-2019 14:50:31", "25-Feb-2019 07:55:24"];
This is my attempt to code the above.
numRows=size(T,1); %Establish total number of rows
numSeg = 3; % Split it into 3 segments
splitIndex = floor(numRows/numSeg); % Number of datapoints in each segment
% Splitting the table into numSeg segments with splitIndex number of datapoints
for m = 1:splitIndex:numRows
for i = length(numSeg)
Seg{i}= T(m);
end
end
% Randomly sample splitIndex-1 datapoints from each segment, repeat
% splitIndex choose splitIndex-1 times
for i = length(numSeg)
datasample(Seg{i})
T{i} = concat(Seg{i})
end
I'm especially struggling with randomly sampling from each segment and then matching the randomly sampled datapoints to append from each of the following segment. Thank you!

Respuestas (1)

Try this:
% Create initial full table.
col1 = [3.0; 5.6; 10.2; 12.0; 14.4; 15.6];
col2 = {"08-Feb-2019 12:34:52"; "11-Feb-2019 16:07:17"; "16-Feb-2019 14:50:31"; "20-Feb-2019 05:43:51"; "25-Feb-2019 07:55:24"; "02-Mar-2019 11:06:27"};
T = table(col1, col2)
% Get a list of randomly chosen rows with none repeated and none missing.
numSeg = 3; % Split it into 3 segments
numRows = height(T);
splitIndex = floor(numRows/numSeg); % Number of datapoints in each segment
randomRows = randperm(numRows)
% Create 3 tables with random rows with no repeated rows.
t1 = T(randomRows(1:splitIndex), :)
t2 = T(randomRows(splitIndex+1: 2*splitIndex), :)
t3 = T(randomRows(2*splitIndex+1:3*splitIndex), :)

6 comentarios

Thanks for this start. If I'm not misunderstanding your code, the randomRows randomly selects a row across the entire dataset, rather than just each segment.
I think I've edited it but I can only get it to select from the first segment. Furthermore, is there a way to make this a foreloop in case my table gets much larger?
% Create initial full table.
col1 = [3.0; 5.6; 10.2; 12.0; 14.4; 15.6];
col2 = {"08-Feb-2019 12:34:52"; "11-Feb-2019 16:07:17"; "16-Feb-2019 14:50:31"; "20-Feb-2019 05:43:51"; "25-Feb-2019 07:55:24"; "02-Mar-2019 11:06:27"};
T = table(col1, col2)
% Get a list of randomly chosen rows with none repeated and none missing.
numSeg = 3; % Split it into 3 segments
numRows = height(T);
splitIndex = floor(numRows/numSeg); % Number of datapoints in each segment
randomSeg1 = randperm(splitIndex,splitIndex-1); % Randomly select from segment 1
randomSeg2 = randperm(splitIndex+1,2*splitIndex); % Randomly select from segment 2
randomSeg3 = randperm(2*splitIndex+1,3*splitIndex); % Randomly select from segment 3
% Create 3 tables with random rows from each segment
t1 = T(randomSeg1,:)
t2 = T(randomSeg2,:)
t3 = T(randomSeg3,:)
If you've already constructed Seg in the way you want and just want to pick one of them at random then you can get a random index and use that one.
randomIndex = randi(numel(Seg))
SegToUse = Seg(randomIndex)
Of course if you want to do it over and over again, you can put it in a loop. If you want to make sure you don't ever use the same one again, you can use randperm
randomIndexes = randperm(numel(Seg));
for k = 1 : whateverYouWant
SegToUse = Seg(randomIndexes(k))
end
And you might want to read this:
Sorry for any misunderstanding, I'm not trying to pick the segments at random, I'm trying to pick the datapoints within them at random but I'm having difficulty defining the segments and randomly selecting within them.
Then try this:
col1 = [1; 2; 3; 4; 5; 6; 7; 8; 9];
col2 = {"08-Feb-2019 12:34:52"; "11-Feb-2019 16:07:17"; "16-Feb-2019 14:50:31"; "20-Feb-2019 05:43:51"; "25-Feb-2019 07:55:24"; "02-Mar-2019 11:06:27"; "04-Mar-2019 11:06:27"; "06-Mar-2019 11:06:27"; "08-Mar-2019 11:06:27"};
T = table(col1, col2)
% Get a list of randomly chosen rows with none repeated and none missing.
numSeg = 3; % Split it into 3 segments
numRows = height(T);
splitIndex = floor(numRows/numSeg) % Number of datapoints in each segment
% Create 3 tables with random rows with no repeated rows.
segments = cell(numSeg, 1); % Preallocate.
for k = 1 : numSeg
index1 = (k - 1) * splitIndex + 1
index2 = k * splitIndex
segments{k} = T(index1:index2, :);
end
% Now for each segment, get the table with the rows randomly ordered.
for k = 1 : numel(segments)
thisSegment = segments{k}; % thisSegment is a table, not a cell
% Get a random ordering of rows within this k'th segment.
randomRows = randperm(height(thisSegment));
% Reorder the segment according to that random ordering and
% put into a new cell array called randomlyOrderedSegments.
randomlyOrderedSegments{k} = thisSegment(randomRows, :)
end
This is very helpful, particularly in separating the segments into cells. For my purposes, it seems that randomsample is more suitable for what I'm trying to obtain. In the last section I believe you're rearranging the rows in each segment, however I'm trying to sample a datapoint from a segment, splitIndex-1 times (one less than the number of datapoints there are.)
I'm trying to generate 8 new tables that mimic the original table. So what I'm hoping the following code does is...
There will be nchoosek(splitIndex,splitIndex-1)^2 datasets created called TNew, and to populate each of those datasets, a random sample from each segment will fill in the empty TNew. There should be 8 cells in TNew, with each cell have a table with the size [6,2].
TNew = zeros([numSeg*splitIndex-1 2]);
for i = 1:(nchoosek(splitIndex,splitIndex-1))^numSeg
for k = 1 : numel(segments)
randDataSample = vertcat(datasample(segments{k},splitIndex-1));
TNew{i}=randDataSample;
end
end
When I execute datasample(segments{k},splitIndex-1), I see it outputting the correct format I want, I'm having issues with vertically concatenating randomly sampled datapoints.
Again, thank you so much for all the help
I've gotten to the point where I can generate a cell, and each cell should have the datapoints drawn from the segments. do you have any suggestion on how to concatenate the tables drawn from each segment?
%%
NumNewDatasets = (nchoosek(splitIndex,splitIndex-1))^numSeg;
TNew=cell(NumNewDatasets,1);
for k = 1 : numel(segments)
sampledData(k)= datasample(segments{k},splitIndex-1)
TNew{k,:}=table2cell(sampledData);
end

Iniciar sesión para comentar.

Productos

Versión

R2023b

Preguntada:

Joy
el 23 de Abr. de 2024

Comentada:

Joy
el 25 de Abr. de 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by