Extracting specific data from multiple excel files and create a single matrix from those

Hi, I have a file on my computer with close to 1000 excel files and I don't want to manually extract the second row from every excel file manually and combine into a single excel file.
I was wondering how I can do this on Matlab through reading every Excel file in that specific file and taking out all of the columns from the second rows and then combining all that into a single matrix?
Thanks for the help

 Respuesta aceptada

hello
this is one example if you want to work out the entire folder
I assumed it would be numeric data so I used importdata (faster)
I also sorted the files names in natural order in case it might be relevant
It works even if your files have different size (number of columns)
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(cd,'*.xlsx')); % get list of all excel files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order
M= length (fileNames_sorted);
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_row{f} = raw(2,:); % extract the second row
end
% write all second row lines into a matrix and store it in excel file
writecell(second_row',fullfile(cd,outfile));

14 comentarios

Hi Mathieu, I am getting this error reading
Unrecognized function or
variable 'second_row'.
Error in output (line 26)
writecell(second_row',fullfile(cd,outfile));
>>
What should I do?
Thanks
hello Jonas
can you check if the for loop is working ok ? do you get an output from this line :
what is displayed in your workspace when you type raw ?
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
When I type raw, I get >> raw
Unrecognized function or
variable 'raw'.
yep my raw is exactly the same as yours.
so this means this line of code could not be executed ....
if you can share a couple of xls files I would like to test on my side...
Yeah sure, I'll send you a small folder of samples. I've tried using this folder and it returned an excel file called OUT that only had two zeroes in the excel file.
From the previous test I tried using a folder that contained 1000 excel files and Matlab somehow printed that error possibly because its too hard to process?
Also if this is the case I could create multiple smaller folders and then combine that into the Output matrix that was previously obtained and how would I do that?
Thanks
hello again
I simply modified one of your data file (ARTEMIS_SET_control_10%.xlsx) from xlsx to CSV format so they all have the same format. If you need to dig with a mix of CSV and XLSX files , I could update my code.
After that , I had not much to do and this code works like a charm, at least for this batch of 24 files;
see also at the end of my code , if you wish to have the filenames also stored in the OUT file, attached FYI
slightly updated code :
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(fileDir,'*.csv')); % get list of files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order (https://fr.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort)
M= length (fileNames_sorted);
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_row{f} = raw(2,:); % extract the second row
end
% write all second row lines into a matrix and store it in excel file
writecell(second_row',fullfile(cd,outfile)); % without filenames
% writecell([fileNames_sorted' second_row'],fullfile(cd,outfile)); % with filenames stored in column 1
(seems to me a lot of files had the same data inside , just the file names are different)
Sorry this is not working for me, all the files are different but its only printing out 401, 0 for every column.
Its supposed to do (2,:) copy everything in the 2nd row from each excel file and then every row thats copied to copy over into a single excel file or a matrix on matlab. Since there are 24 excel files in this batch there are supposed to be 24 rows and 936 columns for the OUT matrix
Since this is to be used for principal components analysis. I only am interested in the second row since its got the absorption of the spectra and the first row simply is the wavenumbers which is the same for all the excel files and is useless information
ok I believe I guess what you really want- which is not what I understood so far
so you want the 2nd column (and not the second row ! ) of each data file
then ok the output size will be 24 rows and 936 columns
as all input files have same dimensions, I could make the code simpler and use writematrix instead of writecell
here code :
clc
clearvars
fileDir = cd;
outfile = 'OUT.xlsx'; % output file name
fileNames = dir(fullfile(fileDir,'*.csv')); % get list of files in directory
fileNames_sorted = natsortfiles({fileNames.name}); % sort file names into order (https://fr.mathworks.com/matlabcentral/fileexchange/47434-natural-order-filename-sort)
M= length (fileNames_sorted);
second_col= [];
for f = 1:M
% option # 1 for numeric data only using importdata
raw = importdata( fullfile(fileDir, fileNames_sorted{f}));
second_col= [second_col raw(:,2)]; % extract the second column
end
% write all second columns lines into a matrix and store it in excel file
writematrix(second_col,fullfile(cd,outfile));
Wow this works!
Thank you so much
Sorry I have another question, I need to create a { } cell which contains group names within it.
I need to create one thats a 5046x1 cell. containing 1682 entries saying Spot1 then 1682 entries saying Spot 2 then 1682 entries saying Spot 3. This is to group off my spectra, Do you want me to repost this so you can get more votes?
Thanks
Hi Jonas
it's not just a question of votes but yes indeed each question / topic should be addressed in a separate post
this way you can also get ore answers because it's not burried in the original post .

Iniciar sesión para comentar.

Más respuestas (1)

Ive J
Ive J el 31 de Ag. de 2021
Editada: Ive J el 31 de Ag. de 2021
You can use readmatrix (assuming all values are numeric, otherwise use readtable) or fileDatastore to read those files. Something like this should work:
myfiles = ["file1.xlsx", "file2.xlsx"]; % file names: use dir to generate file names within the target directory
data = [];
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end

13 comentarios

Hi Ive J, Thanks for that, I was wondering how can I generate all the file names into myfiles without writing all the files out?
Thanks
This would work:
targetDir = pwd; % only if files are within the current directory
filenames = string({dir(fullfile(targetDir, '*.xlsx')).name}); % converted to string to be used in my example above
Hi Ive, should I plug filenames into myfiles as myfiles=[filenames]
Because doing that gives me only 'Output' from the code as the output?
filenames and myfiles are the same. You can use my example as:
targetDir = pwd; % only if files are within the current directory
myfiles = string({dir(fullfile(targetDir, '*.xlsx')).name});data = [];
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end
% data is the matrix you're trying to generate from your excel files.
Sorry, I tried this and again didn't work it only printed out 'output'
What's output? There is no variable called output in my snippet. I used some of your files and it works just fine:
targetDir = pwd; % only if files are within the current directory
myfiles = string({dir(fullfile(targetDir, '*.csv')).name});data = []; % you mentioned you have Excel files, which was wrong
for i = 1:numel(myfiles)
tab = readmatrix(myfiles(i)); % see readmatrix for more options
data = [data; tab(2, :)]; % aggregate 2nd row from all files into a new matrix
end
myfiles =
"18136a AS ABS BC.CSV"
"18136c AS ABS BC.CSV"
"18136d Deep Ocean 3%.CSV"
"18136f AS ABS BC.CSV"
"18136g Deep Ocean 4.5%.CSV"
data =
401.1380 0
401.1380 0
401.1380 0
401.1380 0
401.1380 0
Sorry Ive J, That works but I can't write each file out individually since its 1000 files.
Would you know how to make it work without manually inputting the file names?
Cheers
I didn't manually read files. The second line gets the names of all CSV files within the directory.
Sorry, I mean its only printing out a 2x5 matrix with 401.1380 and 0 like shown in the data.
Its supposed to be for example a 25x936 matrix if there are 25 excel files containing infrared spectra with 2 rows and 936 columns each. The 1st row only contains the wavenumber data which needs to be ignored and the 2nd row contains the absorption which is different for every excel file and needs to be extracted and then combined into the final matrix.
I'm using this for principal components analysis.
It's 5X2 matrix because first, I only used 5 CSV files, and secondly, your sample CSV files contained only 2 columns, so it would be simply a 5X2 matrix.
In case you have 1000 files in the folder, myfiles would be a string array of 1000 CSV file names. If each of those CSV files have also 936 columns, then the resulting matrix would be of size 1000X936.
Note that this line
myfiles = string({dir(fullfile(targetDir, '*.csv')).name});
extracts all CSV files within the target directory (path to the folder your CSV files are in), so doesn't matter if there are 5, 1000 or even more CSV files there, myfiles would still contain all those files which then will be looped over to extract the values exist in 2nd line.
Sorry there are 936 columns in each excel file it just doesn't work for some reason..
Please attach some of these 1000 files (with 936 columns) you're trying to work with.
Also, please be more specific with ...doesn't work for some reason.. What exact error do you get in command window when running my snippet?
Sorry, the error was that it was printing out only 401 and 0. The problem has been solved now I really appreciate the help.

Iniciar sesión para comentar.

Productos

Preguntada:

el 31 de Ag. de 2021

Comentada:

el 1 de Sept. de 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by