resample data based on a particular variable

4 visualizaciones (últimos 30 días)
Boram Lim
Boram Lim el 4 de Mayo de 2018
Comentada: Boram Lim el 4 de Mayo de 2018
I have a large dataset as below. From the data, I want to randomly sample based on 'id' produce the same size data. Since the data has 5 ids, I would like to sample 5 ids with replacement and produce a dataset.
id value var1 var2
1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
3 10
4 11
4 12
4 13
5 14
5 15
5 16
With the data, the desired output could be as below (because I want to sample ids with replacement, there could be duplicated ids)
id value var1 var2
2 5
2 6
2 7
4 11
4 12
4 13
3 8
3 9
3 10
2 5
2 6
2 7
1 1
1 2
1 3
1 4
  2 comentarios
KSSV
KSSV el 4 de Mayo de 2018
What is the difference between both the datasets? They are same.......in the second one you have repeated id 2.
Boram Lim
Boram Lim el 4 de Mayo de 2018
I want to randomly resample data based on id variable

Iniciar sesión para comentar.

Respuestas (1)

KSSV
KSSV el 4 de Mayo de 2018
A = [1 1
1 2
1 3
1 4
2 5
2 6
2 7
3 8
3 9
3 10
4 11
4 12
4 13
5 14
5 15
5 16 ];
id = A(:,1) ; val = A(:,2) ;
N = max(id) ;
idx = randperm(N) ;
iwant = cell(N,1) ;
for i = 1:N
iwant{i} = A(id==idx(i),:) ;
end
iwant = cell2mat(iwant)
  1 comentario
Boram Lim
Boram Lim el 4 de Mayo de 2018
Thank you for your comment. However, any simple way without using for-loop? my data size is around 10million and this work should be done several times.

Iniciar sesión para comentar.

Categorías

Más información sobre Data Type Identification en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by