Most efficient method to search through file names?
Mostrar comentarios más antiguos
I have a large number of files that all have file name formats that are of the form
SSSTTTTMMYY
Where the 'encoding' of the file name breaks down into something like this:
SSS - Three letter code referencing a location (locations that I know and have a MATLAB table that relates these codes to a location name)
TTTT - That represents the 'type' of data that we have captured (also values we already have)
MMYY - Is simply the month and year that data was taken.
So for example, we may have something like:
LDNACPD0618
where LDN = London, ACPD = Average captured pollution data, 0618 = June, 2018.
So here is the actual question:
I want to build a function that can search through these file names that can search based on:
- Search based on choice of site location e.g. All data from site LDN
- Get all files between a number of dates e.g. Select all data between 0118 - 0318
- Search based on choice of 'type' of data e.g. All data that is ACPD
- Or a combination of the above e.g. All data from LDN between 0118 - 0318
What is the most efficient way to do this other than making three separate functions to check each section of the file name? Would something like a regular expression work?
Many thanks for your help and advice in advance!
1 comentario
"Would something like a regular expression work?"
Matching the SSS and TTTT parts would not be too difficult, but matching a range of dates really requires converting to date (e.g. date number or datetime) and then doing a logical comparison.
Start by splitting the names up (e.g. using regexp or indexing) and then:
- comapre SSS using strcmp
- compare TTTT using strcmp
- convert MMYY to datetime and compare using logical comparisons.
Respuesta aceptada
Más respuestas (1)
Folder = 'D:\Your\Folder';
FileList = dir(fullfile(Folder, '*.*'));
NameList = {FileList.name};
% NameList = {'SSSTTTT0617', 'SSSTTTT0631', 'WWWQQQQ0724'}
Data.Location = cellfun(@(s) s(1:3), a, 'UniformOutput', 0);
Data.Type = cellfun(@(s) s(4:7), a, 'UniformOutput', 0);
Data.Date = cellfun(@(s) sscanf(s(8:11), '%d'), a, 'UniformOutput', 1);
% Data which have the Location = 'SSS':
Match = FindData(Data, 'Location', 'SSS')
% Data which have the Location = 'SSS' and the date 0631:
Match = FindData(Data, 'Location', 'SSS', 'Date', 631)
% Data which have the Type 'TTTT' a date between 0631 and 0801:
Match = FindData(Data, 'Type', 'TTTT', 'DateRange', [631, 801])
... etc
function Match = FindData(Data, varargin)
Match = true(size(Data));
for k = 1:2:numel(varargin)
switch lower(varargin{k})
case 'location'
Match = Match & strcmp(Data.Location, varargin{k+1});
case 'type'
Match = Match & strcmp(Data.Type, varargin{k+1});
case 'date'
Match = Match & (Data.Date == varargin{k+1});
case 'daterange'
Match = Match & (Data.Date >= varargin{k+1}(1) & ...
Data.Date <= varargin{k+1}(2));
otherwise
error('Unknown job: %s', varargin{k})
end
end
% Maybe:
% Match = find(Match);
end
3 comentarios
S G
el 10 de Jun. de 2019
Jan
el 10 de Jun. de 2019
Data is a struct with three fields, which contains the arrays of the different parts of the data. If you provide some real test data, a more matching answer is possible. I've guessed, that the file names can be obtained by dir in a specific folder. This was my best guess for this explanation:
I have a large number of files that all have file name formats that are of the form SSSTTTTMMYY
S G
el 10 de Jun. de 2019
Categorías
Más información sobre Dates and Time en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!