Create a signalDatastore from csv files
4 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Luca Reali
el 24 de Abr. de 2023
Respondida: LeoAiE
el 24 de Abr. de 2023
Hello everyone,
I have to train some NNs with large data, but it's the first time I'm dealing with it and I got stuck. I'd like to create a signalDatastore, that seems to be the best option for my purpose, from many csv files. These files contain around 80 features each (column-wise), but features are not the same in each file. I'd like to create a datastore with these csv, then select only the features present in every file (through SelectedVariableNames) and go through filtering, etc. I wouldn't like to read every file and pre-select these features (I already know which are shared among all of the files) before creating datastore because it would be time and resource consuming.
If my workflow is not correct, please let me know, I'd be happy to hear from you.
Thanks in advance.
0 comentarios
Respuesta aceptada
LeoAiE
el 24 de Abr. de 2023
In your case, you can use a tabularTextDatastore to read the CSV files, and then use the SelectedVariableNames property to select only the shared features. Since you mentioned that you already know which features are shared among all the files, you can directly set the SelectedVariableNames property.
Here's an example of how to create a tabularTextDatastore from multiple CSV files, and select specific features using SelectedVariableNames:
% Create a list of CSV files
fileList = {'file1.csv', 'file2.csv', 'file3.csv'}; % Replace with your actual file names
% Create a tabularTextDatastore from the list of CSV files
ds = tabularTextDatastore(fileList, 'TreatAsMissing', 'NA', 'MissingValue', NaN);
% Set the selected features (Replace 'Feature1', 'Feature2', etc. with your actual shared feature names)
sharedFeatures = {'Feature1', 'Feature2', 'Feature3'};
ds.SelectedVariableNames = sharedFeatures;
% Read the data from the datastore
data = readall(ds);
% Proceed with filtering, processing, and training the neural network
By using the SelectedVariableNames property, you'll only read the shared features from each CSV file, avoiding the need to pre-process or filter the data beforehand. This should help you save both time and resources.
0 comentarios
Más respuestas (0)
Ver también
Categorías
Más información sobre Large Files and Big Data en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!