How to do cross-validation partitioning and training on Big data? (I am facing issue with my big data using tall arrays)

2 visualizaciones (últimos 30 días)
Hi,
I am not an expert in matlab big data handling. I am strugling for few days because I have to do features selection and training a machine learning classifier on my data. I have followed matlab docs and few videos from Mathworks. However, I am still not able to make it work.
I have 12 mat (image-001.mat, image-002.mat, ..., image-012.mat) files that is extracted features from 3D images and saved in a folder. Each mat file has arount 600,000 records and is saving two variables, one is data and one is class labels for each record. I need to load all of them as training set into matlab and train a machine learning model with data. However, after loading only 2 files, I am getting the following error:
Index exceeds matrix dimensions.
What I did is as following:
1) I saved all of them as csv files in featFolder and created a datastore :
csvList=dir(strcat(featFolder,'\*.csv')); % list csv files in the folder of extracted features
no_subjects=length(csvList);
ds=datastore(sprintf('%s',strcat(featFolder,'\*.csv')));
2) I created a tall array from the datastore:
TA=tall(ds);
3) when I am trying to send the tall array to a costum function, I am getting the following error:
fsFeatures=call_sequentialfs(TA{:,1:end-1},TA{:,end});
% this is the costum function
function inmodel=call_sequentialfs(X,y)
classes = unique(y);
SVMModels = cell(3,1);
inmodel=cell(3,2); % the first column keeps the selected features and the second column keeps the history
rng(1);
c2 = cvpartition(y,'HoldOut',1/10); % for tall arrays
opts = statset('display','iter','UseParallel',1);
for j=2:numel(classes);
indx = eq(y,classes(j)); % Create binary classes for each classifier
fun = @(Xtrain,Ytrain,Xtest,Ytest)...
sum(Ytest~=predict(fitcsvm(Xtrain,Ytrain,'ClassNames',[false true],'Standardize',true,...
'KernelFunction','rbf','BoxConstraint',1),Xtest));
[inmodel{j,1},inmodel{j,2}] = sequentialfs(fun,X,indx,'cv',c2,'options',opts,'nfeatures',80);
end
end
The error when it reaches to the cross-validation partitioning line:
Error using internal.stats.bigdata.cvpartitionTallImpl>lazyAssert (line 195)
Incompatible tall array arguments. The tall arrays must be created using the same execution
environment.
Error in internal.stats.bigdata.cvpartitionTallImpl (line 80)
LAS = lazyAssert(floor(cv.N * T)>0, LAS, @()
error(message('stats:cvpartition:PTooSmall')),clientfun);
Error in cvpartition (line 153)
cv.Impl = internal.stats.bigdata.cvpartitionTallImpl(varargin{:});
Error in tall/cvpartition (line 20)
cv = cvpartition({t,t.Adaptor,@partitionfun,@clientfun},varargin{:});
I do not know how the cross-validation and training by classifier can be done? I would really appreciate any help and suggestion. If there is any link or code, please share.
Thanks

Respuestas (0)

Categorías

Más información sobre Get Started with MATLAB en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by