Extracting file name elements to vectors

12 visualizaciones (últimos 30 días)
Ellen Maas
Ellen Maas el 16 de Jul. de 2019
Respondida: Ellen Maas el 18 de Jul. de 2019
Hi, I have some climate files separated by minimum and maximum monthly temperatures for a given latitude and longitude from a variety of climate models. I need to combine the two files (tmin and tmax) to create one file with the single average temperature for each month. I'm working on a more elegant, bulk-process solution rather than hard-coded brute force as I have a couple of hundred files to pair up and process. Then I'll be using the average temp files for some analysis.
What I am specifically asking as my question here is how to get elements from the file names into vectors that I can then use to programmatically pair up the tmin and tmax files. I have a very simplified example here:
clear
tmin = [1 1949 -2.54; 2 1949 -0.07];
tmax = [1 1949 16.04; 2 1949 18.74];
% create four files for tmin and tmax for two climate models
fileID = fopen('LatLon_34_-103_tmin_cm1.dat','w');
fprintf(fileID,'%d %d %.2f\n',tmin .');
fclose(fileID);
fileID = fopen('LatLon_34_-103_tmax_cm1.dat','w');
fprintf(fileID,'%d %d %.2f\n',tmax .');
fclose(fileID);
fileID = fopen('LatLon_34_-103_tmin_cm2.dat','w');
fprintf(fileID,'%d %d %.2f\n',tmin .');
fclose(fileID);
fileID = fopen('LatLon_34_-103_tmax_cm2.dat','w');
fprintf(fileID,'%d %d %.2f\n',tmax .');
fclose(fileID);
% build list of files
curDirFiles = dir('*.dat');
fileList = {curDirFiles.name}; %creates a cell array for "contains" function
s = string([34,-103]); % convert lat/lon to string array
search_list = [s,'tmin','tmax']; % build list of strings to search for in file names
file_idx = find(contains(fileList,search_list)); % create index of file names that include the search strings
% extract elements from file names
for i = 1:length(file_idx)
% decompose file name
[latlon,remainder1] = strtok(curDirFiles(file_idx(i)).name,'_');
[lat,remainder2] = strtok(remainder1,'_');
[lon,remainder3] = strtok(remainder2,'_');
[climate_attribute,remainder4] = strtok(remainder3,'_');
[climate_model,remainder5] = strtok(remainder4,'_');
% do stuff
end
As you can see, the for loop at present will only process one file at a time (and only preserve file name elements as scalars good for one loop), which obviously will not allow me to find and access two files at once. Ultimately, I will need to group them by lat, lon and climate model to get the pairs I need to average which is why I'm interested in getting the file elements into vectors.
For example, given the data above, as vectors, I would want to end up with:
lat = [34 34 34 34]
lon = [-103 -103 -103 -103]
climate_attribute = ['tmin' 'tmax' 'tmin' 'tmax']
climate_model = ['cm1' 'cm1' 'cm2' 'cm2']
But in the code above as written, only one file is processed at a time and the file elements are stored as scalars and overwritten with each pass. How do I preserve them into vectors? A bonus would be to do it in a way that eliminates the for loop too, if it's possible.
So I'm wondering if I can do something with arrayfun, for example, that will allow me to process all the files in batch. Arrayfun didn't seem to accept "strtok" as a function, though, since it requires inputs and one "Cannot call or index into a temporary array".
Any ideas how to approach this?

Respuesta aceptada

Ellen Maas
Ellen Maas el 18 de Jul. de 2019
What I figured out was instead of using strtok, I used extractBetween to pull segments of the file names and store them in a cell array:
for i = 1:length(file_idx)
% decompose file name
k = strfind(curDirFiles(file_idx(i)).name,'_');
l = strfiurDirFiles(file_idx(i)).name,'.');
fileDecompose(i,1) = extractBetween(curDirFiles(file_idx(i)).name,1,k(1)-1); % "LatLon"
fileDecompose(i,2) = extractBetween(curDirFiles(file_idx(i)).name,k(1)+1,k(2)-1); % lat
fileDecompose(i,3) = extractBetween(curDirFiles(file_idx(i)).name,k(2)+1,k(3)-1); % lon
fileDecompose(i,5) = extractBetween(curDirFiles(file_idx(i)).name,k(4)+1,l(end)-1); % timeframe
% do stuff
end
This put all the unique file name elements into the array, matrix-style.
4x5 cell:
'LatLon' '34' '-103' 'tmax' 'cm1'
'LatLon' '34' '-103' 'tmax' 'cm2'
'LatLon' '34' '-103' 'tmin' 'cm1'
'LatLon' '34' '-103' 'tmin' 'cm2'
So this solved the immediate question.
For a complete follow-up, I then used Luis Mendo's hack on the "unique" function to get it to work with cell array data:https://stackoverflow.com/questions/24151853/matlab-cell-array-to-string-vector-unique
That gave me a cell array with the unique values of file name elements other than 'tmin' and 'tmax'. Then I used that unique list to build the file names back, creating two file names by manually adding in 'tmax' and 'tmin' to each unique combination, which gave me the correct pairs I needed. Then I just imported the data, averaged the values, and wrote it out to another file.

Más respuestas (1)

Rik
Rik el 16 de Jul. de 2019
You can put the contents of your loop in a separate function and call that with arrayfun if you insist, but that will not really change the grouping of your files.
So if that is not your question, please explain further what kind of problem you are having exactly.
  1 comentario
Ellen Maas
Ellen Maas el 16 de Jul. de 2019
Thanks for responding, Rik. I'll provide more detail. As I stated above, the specific question I am asking is how to get elements from the file names into vectors, which the code shows I am currently using strtok in a for loop. The approach I am pursuing (and asking about here) is to build vectors to end up with all the filename elements preserved in a searchable format so that I can group the sister files and average them.
For example, given the data above, as vectors, I would want to end up with:
lat = [34 34 34 34]
lon = [-103 -103 -103 -103]
climate_attribute = ['tmin' 'tmax' 'tmin' 'tmax']
climate_model = ['cm1' 'cm1' 'cm2' 'cm2']
But in the provided code as written, only one file is processed at a time and the file elements are stored as scalars and overwritten with each pass. How do I preserve them into vectors? A bonus would be to eliminate the loops too.
I'll add this to the question to help others understand where I'm going.

Iniciar sesión para comentar.

Categorías

Más información sobre Climate Science and Analysis en Help Center y File Exchange.

Etiquetas

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by