How can i randomly divide a dataset(matrix) into k parts ??
Mostrar comentarios más antiguos
I have a database and i want to randomly divide it into ka parts of equal size . if the database has n row each part will contain n/k randomly chosen row from the dataset .
Respuesta aceptada
Más respuestas (2)
Oleg Komarov
el 9 de Sept. de 2012
Editada: Oleg Komarov
el 9 de Sept. de 2012
Suppose you have the N by M matrix A. I would randomly permute positions from 1:N and then group them into k partitions. Follows the code.
% Sample inputs
N = 100;
A = rand(N,2);
% Number of partitions
k = 6;
% Scatter row positions
pos = randperm(N);
% Bin the positions into k partitions
edges = round(linspace(1,N+1,k+1));
Now you can "physically" partition A, or apply your code to the segments of without actually separating into blocks.
% Partition A
prtA = cell(k,1);
for ii = 1:k
idx = edges(ii):edges(ii+1)-1;
prtA{ii} = A(pos(idx),:); % or apply code to the selection of A
end
EDIT
You can also avoid the loop, but in that case you have to build a group index that points the row to which partition it belongs and then apply accumarray() to execute your code on the partitions.
4 comentarios
Mariem Harmassi
el 10 de Sept. de 2012
Editada: Oleg Komarov
el 10 de Sept. de 2012
Oleg Komarov
el 10 de Sept. de 2012
Editada: Oleg Komarov
el 10 de Sept. de 2012
Why does it matter?
Anyways, after the loop:
% Index smaller as last
[~,idx] = sort(diff(edges),'descend');
prtA = prtA(idx);
Mariem Harmassi
el 10 de Sept. de 2012
Oleg Komarov
el 11 de Sept. de 2012
Adapting to your requests, I build edges in a slightly different way then:
% Sample inputs scrambling
N = 100;
A = rand(N,2);
k = 6;
pos = randperm(N);
% Edges
edges = 1:round(N/k):N+1;
if numel(edges) < k+1
edges = [edges N+1];
end
% partition
prtA = cell(k,1);
for ii = 1:k
idx = edges(ii):edges(ii+1)-1;
prtA{ii} = A(pos(idx),:);
end
Azzi Abdelmalek
el 9 de Sept. de 2012
Editada: Azzi Abdelmalek
el 10 de Sept. de 2012
A=rand(210,4);[n,m]=size(A);
np=20;B=A;
[c,idx]=sort(rand(n,1));
C=A(idx,:);
idnan=mod(np-rem(n,np),np)
C=[C ;nan(idnan,m)];
[n,m]=size(C);
for k=1:n/np
ind=(k-1)*np+1:k*np
res(:,:,k)=C(ind,:)
end
idxo=reshape([idx ;nan(idnan,1)],np,1,n/np) % your original index
9 comentarios
Mariem Harmassi
el 10 de Sept. de 2012
Azzi Abdelmalek
el 10 de Sept. de 2012
Editada: Azzi Abdelmalek
el 10 de Sept. de 2012
the original index is idxo
check the updated code
Mariem Harmassi
el 10 de Sept. de 2012
Azzi Abdelmalek
el 10 de Sept. de 2012
idnan=mod(np-rem(n,np),np)
% if you have 205 data and np=20, we have to complete with 15 nan value, will be ignored, it help us to reshape 220/20 like you said
Mariem Harmassi
el 10 de Sept. de 2012
Editada: Mariem Harmassi
el 10 de Sept. de 2012
Azzi Abdelmalek
el 10 de Sept. de 2012
Editada: Azzi Abdelmalek
el 11 de Sept. de 2012
%instead of reshape use
for k=1:n/np
ind=(k-1)*np+1:k*np
res(:,:,k)=C(ind,:)
end
Azzi Abdelmalek
el 10 de Sept. de 2012
Because reshape is working column after column
Mariem Harmassi
el 11 de Sept. de 2012
Azzi Abdelmalek
el 11 de Sept. de 2012
look at the updated code
Categorías
Más información sobre Logical en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!