Order of files pulled from Datastore

6 visualizaciones (últimos 30 días)
Austin
Austin el 5 de Oct. de 2023
Comentada: Walter Roberson el 5 de Oct. de 2023
I have created a datastore with around 1000 csv files, labeled filename_1, filename_2,...filename_1000. When I try and read the data from the datastore into a new table though, it reads in a weird order:
How can I get it to read the files in the typical 1,2,3,etc order?
Here is the rest of the code for reference:
Thanks! Austin

Respuesta aceptada

Walter Roberson
Walter Roberson el 5 de Oct. de 2023
Editada: Walter Roberson el 5 de Oct. de 2023
datastore are processed in the order listed in the Files property.
When you datastore() passing in a wildcard name or one or more directory names, the order that the Files property will be populated is not defined
  1 comentario
Walter Roberson
Walter Roberson el 5 de Oct. de 2023
You have a few possibilities:
  1. Somehow construct an explicit list of files in the order you want, and datastore() that list instead of passing in a directory or wildcard; or
  2. after the original datastore is constructed, extract the Files property, do something to get the list sorted in the order you want, and set the results back as the Files property; or
  3. change your expectations that there is a "wrong" order to process the files in.
The File Exchange contribution natsortfiles might help you with sorting.
datastore() should not be expected to guess that you want the files to be processed in some particular order.
For example if you pass a files extension list to datastore() then should the order be "process all directories in the order given, looking for the first file extension, then process all of the directories again in the order given, looking for the second file extension"? Or should it be "process each directory in order; within each directory, process all files for the first file extension, then all files for the second file extension" ? Or should it be "process each directory in order; for any particular file "base" name, look for the base name with each of the given file extensions in order passed"? Or should it be "process each directory in order; for any particular name, if the file extension matches any of the passed file extensions, add it to the list" ?
If nested directories are provided, then should the complete parent folder be processed without descending into any subfolders, then descend each subfolder in order?" Or should each subfolder be processed as it is encountered alphabetically? Or should subfolders of a folder all be processed before the parent folder is processed?
If order is important, then use whatever facilities are needed to create an ordered list of files and pass the ordered list to datastore()

Iniciar sesión para comentar.

Más respuestas (1)

dpb
dpb el 5 de Oct. de 2023
"...around 1000 csv files, labeled filename_1, filename_2,...filename_1000"
It's sorted in ASCII order; hence filenames beginning with 0 come first, then numbers beginning with 1, etc., .... You should have used
N=1000;
fnames=compose('filename_%04d.csv',0:N).';
fnames([1:5 end-4:end])
ans = 10×1 cell array
{'filename_0000.csv'} {'filename_0001.csv'} {'filename_0002.csv'} {'filename_0003.csv'} {'filename_0004.csv'} {'filename_0996.csv'} {'filename_0997.csv'} {'filename_0998.csv'} {'filename_0999.csv'} {'filename_1000.csv'}
Or, there is <FileExchange sort_nat> which will make up for the original oversight... :)

Categorías

Más información sobre File Operations en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by