Getting a List of Files.... Should Be Easy But

148 visualizaciones (últimos 30 días)
Matt O'Brien
Matt O'Brien el 30 de Ag. de 2022
Comentada: Matt O'Brien el 14 de Sept. de 2022
I have a GUI app 99% complete, but have spent several days trying to resolve what, to me, should be an extremely simple task.
Namely,
I need to get a list of all files and sub files from a given foldername down (incl all subfolders).
The list should exclude;
  • Directories.
  • System Files
  • Hidden Files
  • Files which start with a full stop (ie Mac hidden files)
  • Files which have ".Spotlight-V100" as part of the folder path.
  • Files which have ".Trashes" as part of the folder path.
Questions.
  1. Is there a MatLab command to do this ….. ????
  2. Or is there an elegant routine to do this.
  3. Or is there a Matlab plug-in which will do this.
I have created a matlab script to do this on Windows, but it is approx 90 lines of code. I hate to think what I might have to do to get this working on a Mac as well.
I will be using this regularly, so wish to make this as elegant/efficient as possible.
  4 comentarios
Matt O'Brien
Matt O'Brien el 30 de Ag. de 2022
[~,d]=system(sprintf('dir /A:-H-D-S /S /B "%s"',rootpath));
This is real close .... but delivers result as a long string of chars. ( 1x13424 char )
Is there any way to get the result as an array or structure of some kind.
dpb
dpb el 30 de Ag. de 2022
@Voss -- good catch -- I threw that in at the end...will fix original.

Iniciar sesión para comentar.

Respuesta aceptada

Matt O'Brien
Matt O'Brien el 14 de Sept. de 2022
Editada: Matt O'Brien el 14 de Sept. de 2022
I think this might be the final version..... I may find some other odd directories which may need to be excluded, but this code provides a good basis for coding for such scenarios.
dinfo = dir('F:\**\*');
full_filenames = fullfile({dinfo.folder}, {dinfo.name});
%filtering
[~,stats] = cellfun(@fileattrib, full_filenames);
is_unwanted = [stats.hidden]==1 | [stats.system]==1;
dinfo(is_unwanted) = [];
full_filenames(is_unwanted) = [];
dinfo([dinfo.isdir]) = []; %exclude directories
dinfo( startsWith({dinfo.name}, '.') ) = []; %exclude hidden files, which is same as . files on MacOS
dinfo( contains({dinfo.folder}, {'.Spotlight-V100', '.Trashes','System Volume Information'}) ) = []; %exclude those particular directories
My thanks to Walter Roberson in particular, and all who have contributed to this discussion / solution.
Ps. I am happy working with the the struct dinfo as the final output of this snippet.
  4 comentarios
dpb
dpb el 14 de Sept. de 2022
Editada: dpb el 14 de Sept. de 2022
Oh,yeah...forgot about stats being an array when putting together...just put the [] you had back that I left out to assimilate back into vectors...
...
full_filenames = fullfile({dinfo.folder}, {dinfo.name}).'; % convert to column for pretty
[~,stats] = cellfun(@fileattrib, full_filenames);
dinfo=dinfo(~([stats.hidden]|[stats.system]|[stats.directory]));
...
Matt O'Brien
Matt O'Brien el 14 de Sept. de 2022
Yes... just confirming .... the following snippet works with my test data. Thanks for the prompt response.
CULL_STRS={'.Spotlight-V100', '.Trashes','System Volume Information'};
dinfo = dir('F:\**\*');
%filtering
full_filenames = fullfile({dinfo.folder}, {dinfo.name});
[~,stats] = cellfun(@fileattrib, full_filenames);
dinfo=dinfo(~([stats.hidden]|[stats.system]|[stats.directory]));
dinfo=dinfo(~contains({dinfo.folder},CULL_STRS));
full_filenames = fullfile({dinfo.folder}, {dinfo.name});
Bringing the ad-hoc folders into a variable and using a single filter statement on the attributes is very elegant coding.
I will be using it in a variety of scenarios and it will probably get exposed to a substantial use in due course. I will post back here if I find any unusual gottchas. BTW. I have only tested on Windows, will test on Mac within a few months.

Iniciar sesión para comentar.

Más respuestas (9)

dpb
dpb el 30 de Ag. de 2022
Movida: dpb el 30 de Ag. de 2022
  1. Directly as one command, no.
  2. Probably not extant, no.
  3. Certainly not specifically, no.
I "know nuthink!" of Mac OS ls equivalent, but doesn't really seem as though it should be particularly difficult -- certainly can't see why it would take some 90 lines of code.
rootpath='YourRootPath';
[~,d]=system(['dir /A:-H-D-S /S /B ' rootpath] );
d=string(split(d,newline));
d=d(strlength(d)>0);
will give you a list of all files that are not hidden/directories/system files for the rootpath folder and all subfolders in a list on Windows using the default CMD command shell. Mac surely has something equivalent.
A few well-chosen filters against the not-wanted list of this list should be pretty slimple to code; a regexp guru there might be of some help; that wouldn't be me, however... :)
I've always wondered why/wished for that TMW would have just supported the basic OS command line switches for the native OS in its incarnation of dir -- having it neutered as it is to "plain vanilla" is a real pain.
ADDENDUM:
Thinking about the exclude list, led me to thinking it's not that hard, either...with the caveat you have had the discipline to not name a file with the excluded path name in a directory not in the excluded list.
excludeList=[".Spotlight-V100"; ".Trashes"]; % filename content to exclude
d=d(~contains(d,excludeList)); % get rid of 'em...
I guess even that part above could be handled if used
d=d(~contains(fileparts(d),excludeList)); % exclude unwanted folders only
I dunno how you would handle @Walter Roberson's comment re: Mac and OS files -- although I'd hope you aren't putting your data where the OS stores its files so it wouldn't be an issue, anyway.
  2 comentarios
Walter Roberson
Walter Roberson el 30 de Ag. de 2022
ls('-ld', tempdir)
drwxrwxrwt 8 root root 200 Aug 30 22:45 /tmp/
You can see that on MacOs and Linux, basic command line switches for MATLAB ls() are supported.
dpb
dpb el 30 de Ag. de 2022
Yeah, but not for Winwoes -- nor does dir for either which was my specific complaint.

Iniciar sesión para comentar.


Walter Roberson
Walter Roberson el 30 de Ag. de 2022
query_folder = tempdir; %set as appropriate, tempdir is just used for example purposes
dinfo = dir( fullfile(query_folder, '**', '*') );
dinfo([dinfo.isdir]) = []; %exclude directories
dinfo( startsWith({dinfo.name}, '.') ) = []; %exclude hidden files, which is same as . files on MacOS
dinfo( contains({dinfo.folder}, {'.Spotlight-V100', '.Trashes'}) ) = []; %exclude those particular directories
Unless, that is, when you refer to "hidden files", you refer to things such as ~/Library . If so then you would need to use ls -@ to query looking for the extended attribute com.apple.FinderInfo 32 or system() out to xattr looking for com.apple.FinderInfo
Well, except for the fact that if you add a color tag to a file then the com.apple.FinderInfo attribute with value 32 gets added to the file, and com.apple.metadata:_kMDItemUserTags gets added as well. If you then remove the color from the file, then com.apple.FinderInfo gets removed but a com.apple.metadata:_kMDItemUserTags attribute gets left behind. So to determine whether a file is hidden you need to look for com.apple.FinderInfo is present with value 32 but com.apple.metadata:_kMDItemUserTags is not present...
  2 comentarios
Matt O'Brien
Matt O'Brien el 31 de Ag. de 2022
My app should not be going near system directories. It is mainly to transfer and process images from SD cards to a hard drive. In some cases the SD cards are backed up to an ssd drive in the field. It is while trying to process from the Sdd drives that I am encountering large volumes of hidden, system, spotlight and trash scenarios.
My app needs to cater for all such scenarios, but will not be looking in traditional system folders.
Matt O'Brien
Matt O'Brien el 31 de Ag. de 2022
Editada: Matt O'Brien el 31 de Ag. de 2022
Sony marks certain folders or files as hidden on SD cards used within Sony cameras. E.G. There is a database stored on the card, used by the camera for various camera functions. It is understandable that Sony has marked such files as hidden. I need to make sure these files are not included in the list I wish to transfer and process.

Iniciar sesión para comentar.


Matt O'Brien
Matt O'Brien el 30 de Ag. de 2022
I will explore the various suggestions..... Thanks... all good.
Some of the hidden files can be recognised with a name starting with a full stop. Some are dependent on the system or hidden attribute. Very messy.
Getting back to the system command.
The suggested syntax gives me a long stream of chars, with no delimiter between each file name/path.
The following is a quick and fairly crude work around.
Here I split using the Drive Letter and will then need to add it back in at some stage. I can do that with code in my app.
rootpath='W:\MoB_AllData\MoB_TestCopyOfSD_Card Valid\';
[~,d]=system(sprintf('dir /A:-H-D-S /S /B "%s"',rootpath));
DirSplit = split(d,'W:\')
This gives me an array of filenames (ie file urls)... which I can work with.
I will need to reconcile the results.... (of the various suggestions).... Will work on this tomorrow (midnight here in Dublin, Ireland)... and post back some comments when I check the details.
  11 comentarios
dpb
dpb el 31 de Ag. de 2022
I've not delved into how to control it (although I may have done and just forgotten it) but when "bang" to OS under MATLAB here on Win10, it still uses CMD.EXE. system is builtin so can't see what it actually does, I presume, however, it uses a start command to spawn a new CMD.EXE process, passing it the rest of the command as parameters.
I deduce that because personally on Windows I use the JPSoftware replacement command processor instead of the MS-supplied CMD and even if Windows is configured to use it as default, MATLAB still use CMD.EXE, not the system default. I've also thought that very rude of MATHWORKS to have done and not use the system default so the user could have their toolsets at hand if wish. With the TakeCommand processor from JPSoft, one could add in the various exclusions into its enhanced DIR and do virtually all the culling before returning the list. That, however, doesn't help for Mac not those who don't use it, of course.
But, looks as though you've basically got the problem solved with Walter's esteemed help so I'll retire from the field here unless you have something else specific along this line you care to pursue.
Good luck!!!
Matt O'Brien
Matt O'Brien el 31 de Ag. de 2022
Walter's elegant package of code is so sweet. I am seriously impressed. I just tested it and found a glitch .... hoping Walter will recognise the issue.... posted a comment relative to his post.
I will explore the PowerShell / System(dir) combo. It looks very powerful... but I need to put time and energy into grasping this tool kit... in due course.
Given the elegance of Walter's package of code, I intend to create a general purpose function I can use to generate the list of files required and provide options to include /exclude system files, hidden files, etc. I can then use this for both Win and Mac versions of my app. Also impressed with your use of the dir command. It has been so long since I was near a dos prompt. My thanks. This has been a most useful discussion.

Iniciar sesión para comentar.


Matt O'Brien
Matt O'Brien el 30 de Ag. de 2022
Editada: Matt O'Brien el 30 de Ag. de 2022
Just some feedback ...
rootdir = 'W:\MoB_AllData\MoB_TestCopyOfSD_Card Valid\**\*.*'
dinfo = dir( rootdir );
dinfo([dinfo.isdir]) = []; %exclude directories
dinfo( startsWith({dinfo.name}, '.') ) = []; %exclude hidden files, which is same as . files on MacOS
dinfo( contains({dinfo.folder}, {'.Spotlight-V100', '.Trashes'}) ) = []; %exclude those particular directories
This works beautifully .. but the result includes a small number of hidden files ....
I will add filters tomorrow to filter dinfo() based on the system and hidden attributes as well as the above filters.

Matt O'Brien
Matt O'Brien el 31 de Ag. de 2022
Here is my current state of play.....
This elegant snippet (from Walter Roberson ...much appreciated) does most of the heavy lifting.
rootdir = strcat(MyDrive,'**\*.*'); %MyDrive stores full path to folder of interest
MyFileList=dir(rootdir); %get info of files/folders in current directory
%dinfo = dir( rootdir );
MyFileList([MyFileList.isdir]) = []; %exclude directories
MyFileList( startsWith({MyFileList.name}, '.') ) = []; %exclude hidden files, which is same as . files on MacOS
MyFileList( contains({MyFileList.folder}, {'.Spotlight-V100', '.Trashes'}) ) = []; %exclude those particular directories
But it does not catch Windows files with the attribute of Hidden.
This following snippet catches the hidden Windows files (using my test data).
% remove all folders ( already done... but initialises isBadFile)
isBadFile = cat(1,MyFileList.isdir); %# all directories are bad
% loop to identify hidden files
for iFile = find(~isBadFile)' %'# loop only non-dirs
%# on OSX, hidden files start with a dot
%isBadFile(iFile) = strcmp(MyFileList(iFile).name(1),'.'); % already removed.. in code above
if ~isBadFile(iFile) && ispc
%# check for hidden Windows files - only works on Windows
tmpName = MyFileList(iFile).name;
tmpFullName = strcat(MyFileList(iFile).folder,'\',tmpName)
[~,stats] = fileattrib(tmpFullName);
if stats.hidden
isBadFile(iFile) = true;
end
if stats.system
isBadFile(iFile) = true;
end
end
end
%# remove bad files
MyFileList(isBadFile) = [];
I found the above snippet which catered for Hidden Windows files and also added a filter to catch Windows files with a System Attribute.
I am focused at the moment to get my app working for Windows. Will use it for a few months before I consider adjusting to make it work for Windows and Mac.
In due course I will explore the use of PowerShell to generate the equivalent of a Dir list. This involves the use of Get-ChildItem cmdlet in Recursive mode. That is for another day or project.
However, I have already created a simple Matlab GUI to allow the selection of a folder and generate the file list, outputting it to an Excel file, spitting up into component fields/columns such as Drive,Folder,FileName,Extension, Bytes, Date. This miniApp uses the code included here. This will work for me to generate lists until I get a better handle on Windows PowerShell and has the advantage in that it should work for Mac (with a few tweaks) and Windows.
As a Matlab beginner, my main GUI app might have been a bit too complex, but it delivers functionality which will save me a lot of time ingesting images, movies, sound and other digital assets, in a structured manner from SD cards to a digital repositary. I can start to use it now for my real world needs.
I am truely impressed and grateful for the promptness and quality of the responses here.
  2 comentarios
Walter Roberson
Walter Roberson el 31 de Ag. de 2022
We already know the names cannot be folders, so there is no point testing that.
The below code will work on MacOS and Linux as well -- those will return NaN for the hidden and system attributes, but but specifically testing == 1 then both NaN and 0 are treated as false, so NaN does not need to be special cased.
No loop is needed.
tmpFullNames = fullfile( {MyFileList(iFile).folder}, {MyFileList(iFile).name});
[~,stats] = fileattrib(tmpFullNames);
isBadFile(iFile) = [stats.hidden]==1 | [stats.system]==1;
myFileList(isBadFile) = [];
Matt O'Brien
Matt O'Brien el 31 de Ag. de 2022
Brilliant.... will test with my data in the next few days.

Iniciar sesión para comentar.


Matt O'Brien
Matt O'Brien el 31 de Ag. de 2022
Editada: Matt O'Brien el 5 de Sept. de 2022
To Walter.
I ran into a problem with the following.
"No loop is needed."
iFile in the following line is not defined. This is the loop variable in the code I was using to cater for system and hidden files, so I am not sure how iFile should be defined or initialised. I get the following error.
"Unrecognized function or variable 'iFile'.
[ Final working code can be found at this link https://uk.mathworks.com/matlabcentral/answers/1791125-getting-a-list-of-files-should-be-easy-but?s_tid=mlc_ans_email_view#comment_2341705 ] Thanks to Walter for his valuable snippet.]
rootdir = strcat(MyDrive,'**\*.*'); %MyDrive stores full path to folder of interest
MyFileList=dir(rootdir); %get info of files/folders in current directory
%dinfo = dir( rootdir );
MyFileList([MyFileList.isdir]) = []; %exclude directories
MyFileList( startsWith({MyFileList.name}, '.') ) = []; %exclude hidden files, which is same as . files on MacOS
MyFileList( contains({MyFileList.folder}, {'.Spotlight-V100', '.Trashes'}) ) = []; %exclude those particular directories
tmpFullNames = fullfile( {MyFileList(iFile).folder}, {MyFileList(iFile).name});
[~,stats] = fileattrib(tmpFullNames);
isBadFile(iFile) = [stats.hidden]==1 | [stats.system]==1;
myFileList(isBadFile) = [];
Maybe I am using your suggested code in the wrong place. I can see the overall package of code is a 'thing of beauty' and an inspiration to me to achieve this standard of coding. Apologies if I am missing something obvious.
Regards.
  11 comentarios
Matt O'Brien
Matt O'Brien el 1 de Sept. de 2022
Editada: Matt O'Brien el 1 de Sept. de 2022
Some background. Ignore the following if not interested in the background.
Dealing with Digital Assets from Movie/Still cameras.
In most typical cases, people using Matlab dir feature may be looking for a specific range of files such as *.txt or *.jpg, or be exploring directories which do not contain unusual, hidden or system files or directories.
In my case I am dealing with digital assets from cameras (ie SD cards or similar) in a card reader, or a direct copy of an SD card, for example a card backed up to disk.
For occasional shooters they may never notice the potential complexity of the folder structure on an SD card, but volume shooters (ie sports/wildlife, etc) will generate 1000's of images in a very short time frame, which then may be stored in a series of Dcim folders. Each camera maker needs to conform to a Dcim standard, but also has freedom in certain aspects how the images are stored within a Dcim folder substructure. The gottcha for many people is that a Dcim folder will normally only hold 9999 images. If more than 9999 images are shot, then they go into a different Dcim subfolder. If the images are copied directly from card to disk, then there is a risk of either duplicate file names, or horror of horrors, images from one Dcim subfolder overwriting images from another dcim folder when copied to a target destination folder (ie the file names are the same but the image contents are different). The situation for video, sound and other digital assets can be even more complex.
The camera (stills or video) may also use the card as the storage to allow the photographer manage the images captures, rename folders, filenames, rate images, delete images, etc..., directly on there camera, in the field. Therefore the maker may have hidden databases and status files to manage this realtime camera functionality.
As photographers, videographers, editors may use a large range of applications to manage their images, lots of other rubbish gets added by these apps, such as 'Spotlight-V100', '.Trashes', etc..
So, my simple Sd cards look very simple, but may contain a nightmare of hidden delights. It is super critical to filter out all such files or folders.
As I test out my app, using different cameras, I expect to find more surprises.
My app caters for managing the transfer of image, sound, video and related files from card (or backup of card) to disk, providing the option to filter files based on format (eg raw, jpg, wav, mov,xmp,etc) and date. Eg. Jpgs from a specific date may be directed to Project A, and video files with a different date may be directed to project B. The app guarantees that the original camera number, unique image number and other details are retained, as well as keeping an Excel based audit log of all files copied.
I had the main GUI built and the major application logic coded (and tested).... but kept getting stuck with unwanted system or hidden files.
Thanks to your help here, I have made a significant step forward.
dpb
dpb el 1 de Sept. de 2022
Interesting. I've never been "bit" by the camera bug -- I've bought several with the intent over the years, but they all end up just sitting on the shelf gathering dust, so I've never poked around with any.
I do think this thread is another "shot across the bow" that TMW should strengthen the builtin functionality of dir() to support the underlying OS switches.
I've not tried the dir route via expressly "banging" to CMD.EXE with the command string to avoid powershell -- going that route has the benefit of return the FQNs as a list without having to construct them from the struct returned by dir() as well as the various attribute screening done first instead of later.
Anyways, looks as though you've basically got it sorted -- the other vendor idiosyncracies are likely just going to be additional specific strings to add to the exclusion list. Unfortunately, one can imagine that may continue to grow as new models/features are introduced...

Iniciar sesión para comentar.


Matt O'Brien
Matt O'Brien el 1 de Sept. de 2022
I am a serious amateur photographer. A civil engineer by qualification but got into the Information Technology world ex college. Lived in the world of large / global scale enterprise computing at an application and infrastructure layer. I have deep sympathy for the end user and will always champion ease of use above technical challenges. Given my background, I cannot live with inefficient workflows. So the modern world of photography is a nightmare both for the beginner and the seasoned professional (ie flow from image capture to a published or printed product).
I have dipped my toe into this space because I see photogrphers (especially high volume shooters), struggle with the needless complexity of getting images from their cameras to a back end computer and arrive there in a structured and audited manner.
I left the world of writing code in the early 80's, but have been responsible for large development and implemention teams in a large variety of industries in manufacturing, distribution and banking. I am familiar with quite a number of IDE's, but I am impressed with the small foot print of Matlab. However, while I know how to structure and design complex apps or systems, I seriously struggle with Matlab syntax and the ever present challenge of converting simple ideas to simple lines of code.
I like the fact that migrating from Windows to Mac is at least feasible, but probably another mountain to be climbed in due course.
I cannot believe that the dir function has not got simple switches to make it easier to filter system, hidden or sub directories and there are other anomolies working with drives and/or folders. Yes, the attrib variable is there, but should be simplier to use.
The important thing is to make progress... even if it is only tiny steps.
Resolving the dir issue for me might be small in the scheme of things but was vip in terms of creating a working solution. Photographers who might get to use this will never know what happens under the hood, and maybe there is a little satisfaction in that.
My best regards and deep felt appreciation to the members here..

Matt O'Brien
Matt O'Brien el 12 de Sept. de 2022
I discovered another condition which needs to be catered for, to be able to get a list of all files and subfiles from a selected drive or folder.
Dir 'folder' entries with a value of 'System Volume Information' should be excluded from the final dir list generated.
  11 comentarios
Matt O'Brien
Matt O'Brien el 13 de Sept. de 2022
Editada: Matt O'Brien el 13 de Sept. de 2022
This generates an error ....
Error using fileattrib
Argument must be a text scalar.
Error in Test (line 6)
[~,stats] = fileattrib(full_filenames);
I cannot debug now, will revisit this evening. Have to travel for an appointment.
I suspect fileattrib needs an actual file name (char or string) but may not work wioth an array of filenames. If so, will need to loop.
Walter Roberson
Walter Roberson el 13 de Sept. de 2022
dinfo = dir('C:\**\*');
full_filenames = fullfile({dinfo.folder}, {dinfo.name});
%filtering
[~,stats] = cellfun(@fileattrib, full_filenames);
is_unwanted = [stats.hidden]==1 | [stats.system]==1;
dinfo(is_unwanted) = [];
full_filenames(is_unwanted) = [];
(On Mac, fileattrib is happy to work on the cell array when I test)

Iniciar sesión para comentar.


Matt O'Brien
Matt O'Brien el 13 de Sept. de 2022
Thanks. Just home, so will pick up in the morning. Documentation on fileattrib seems to indicate char or string but not array. I can explore the use of @ feature.

Categorías

Más información sobre File Operations en Help Center y File Exchange.

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by