Readall stops because of non-existing data, Can I skip these?

40 visualizaciones (últimos 30 días)
Kristine
Kristine el 19 de Sept. de 2025 a las 20:55
Editada: dpb el 21 de Sept. de 2025 a las 15:55
Hi y'all,
When I run my code I get an error:
Error using matlab.io.datastore.TabularDatastore/readall (line 196)
Unable to parse a "Numeric" field when reading row 39898, field 7.
Actual Text: "N,07121.27550,W,2,28,0.9,0.79,M,,,5,0131*01"
Expected: A number or literal "NaN", "Inf". (possibly signed, case insensitive
Is there a way to have the readall function skip these ,,, non-existing values in my data?
Code:
function [] = each_day_table(folder_referenced, output_folder_location)
dimention = size(folder_referenced) ;
for i = 1:dimention(2)
datastore_result = datastore(folder_referenced(i)) ;
original_data = readall(datastore_result) ;
new_table = new_table_useful_data(original_data) ;
[~, name, ext] = fileparts(folder_referenced(i)) ;
output_filename = fullfile(output_folder_location, "new_" + name + ext) ;
writetable(new_table, output_filename)
end
folder_with_files = dir(input_folder_location) ; % define directory where files are
filenames = fullfile({folder_with_files.folder}, {folder_with_files.name}) ; % access file path and name information
csvFiles = endsWith(filenames, '.csv') ; % use logi → determining which values end in .csv (fullfile provides axtra info we don't want)
filenames = filenames(csvFiles) ; % create cell array with all file names
each_day_table(filenames, output_folder_location) ; % applies all the functions to the files
I've attached a ference file.
  3 comentarios
Kristine
Kristine el 19 de Sept. de 2025 a las 21:45
It is similar, but I don't think I can use the solution provided by the accepted answer, because it doesn't work for empty cells. Or at the very least I can't make it work.
dpb
dpb el 19 de Sept. de 2025 a las 22:03
Editada: dpb el 20 de Sept. de 2025 a las 15:11
Similar, but not the same as, Walter. There there were additional page header lines in the file that were specific text that could be identified and those particular values then set as
% ...,'TreatAsMissing',[TreatAsMissing={'Time','Board0_Ai0'}, ...
One would have thought the default of "" and NaN for default 'MissingValue' should have worked but apprently not.
There's the 'MissingRule' in an import options object that I guess one could see if
% ...,'MissingRule','omitrow', ...
would be accepted, but I'd guess it unlikely and afaict there's no facility to use an import object here.
Well, let's just see...
ds=datastore('20250214RAW.log.csv','MissingRule','omitrow')
Error using datastore (line 178)
'MissingRule' is not a recognized parameter. For a list of valid name-value pair arguments, see the documentation for datastore.
So that isn't allowed and
ds=datastore('20250214RAW.log.csv');
data=readall(ds,'MissingRule','omitrow');
fails as well as it says the 'UseParallel' is the only recognized named parameter.
I'm sure there must be a way, but it certainly isn't clear to me how to beat it into submission. A regular call to readtable with the import options object would work, but there's no way to use one here that I can see.

Iniciar sesión para comentar.

Respuesta aceptada

dpb
dpb el 19 de Sept. de 2025 a las 23:30
Editada: dpb el 21 de Sept. de 2025 a las 15:55
A workaround until somebody can come up with the clean answer
function fixupmissingfields(infile,outfile)
% substitute NA into missing comma-delimited fields in input file
% and write to output file
fidi=fopen(infile,'r');
fido=fopen(outfile,'w');
while ~feof(fidi)
l=fgetl(fidi); % read line excluding terminator
w=split(l,','); % get fields
w(strlength(w)==0)={'NA'}; % insert the missing indicator in empty fields
l=char(join(w,',')); % put back together
fprintf(fido,'%s\n',l); % output to new file with newline
end
fclose(fidi);
fclose(fido);
end
Alternatively, you could mimic the 'omitrow' of readtable with
function omitmissingrows(infile,outfile)
% skip records with missing comma-delimited fields in input file
% and write to output file
fidi=fopen(infile,'r');
fido=fopen(outfile,'w');
while ~feof(fidi)
l=fgets(fidi); % read line including terminator
if contains(l,',,'); % at least one missing field
continue % skip this record from output
end
fprintf(fido,'%s',l); % output to new file
end
fclose(fidi)
fclose(fido)
end
Illustrate on the attached file adding NA...
infile='20250214RAW.log.csv';
outfile=strrep(infile,'RAW','CLEANED');
fixupmissingfields(infile,outfile)
type(outfile);
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA 235957.055,4131.96230,N,7121.29101,W,2,34,0.8,0.02,M,NA,NA,6,0131*0E 235958.055,4131.96230,N,7121.29099,W,2,34,0.8,0.02,M,NA,NA,7,0131*00 235959.055,4131.96230,N,7121.29098,W,2,34,0.8,0.02,M,NA,NA,7,0131*00 0.055,4131.96230,N,7121.29096,W,2,34,0.8,0.02,M,NA,NA,7,0131*0F 1.055,4131.96230,N,7121.29094,W,2,34,0.8,0.02,M,NA,NA,4,0131*0F 2.055,4131.96230,N,7121.29092,W,2,34,0.8,0.02,M,NA,NA,5,0131*0B 3.055,4131.96231,N,7121.29090,W,2,34,0.8,0.02,M,NA,NA,6,0131*0A 4.055,4131.96230,N,7121.29087,W,2,33,0.8,0.02,M,NA,NA,7,0131*0C 5.055,4131.96231,N,7121.29085,W,2,33,0.8,0.03,M,NA,NA,7,0131*0F 6.055,4131.96231,N,7121.29084,W,2,33,0.8,0.03,M,NA,NA,7,0131*0D 7.055,4131.96231,N,7121.29081,W,2,33,0.8,0.03,M,NA,NA,4,0131*0A 8.055,4131.96231,N,7121.29079,W,2,33,0.8,0.03,M,NA,NA,5,0131*03 9.055,4131.96231,N,7121.29077,W,2,33,0.8,0.03,M,NA,NA,6,0131*0F
  1 comentario
dpb
dpb el 20 de Sept. de 2025 a las 17:53
Editada: dpb el 20 de Sept. de 2025 a las 21:41
ADDENDUM
The above assume file may be too large to fit in memory; if it can be read all at once, then the first can be vectorized as
function fixupmissingfields2(infile,outfile)
% substitute NA into missing comma-delimited fields in input file
% and write to output file - in memory version
w=split(readlines(infile),',');
w(strlength(w)==0)={'NA'};
writematrix(w,outfile,'Delimiter',',','QuoteStrings',0);
end
infile='20250214RAW.log.csv';
outfile=strrep(infile,'RAW','CLEANED');
fixupmissingfields2(infile,outfile);
type(outfile)
NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA 235957.055,4131.96230,N,7121.29101,W,2,34,0.8,0.02,M,NA,NA,6,0131*0E 235958.055,4131.96230,N,7121.29099,W,2,34,0.8,0.02,M,NA,NA,7,0131*00 235959.055,4131.96230,N,7121.29098,W,2,34,0.8,0.02,M,NA,NA,7,0131*00 0.055,4131.96230,N,7121.29096,W,2,34,0.8,0.02,M,NA,NA,7,0131*0F 1.055,4131.96230,N,7121.29094,W,2,34,0.8,0.02,M,NA,NA,4,0131*0F 2.055,4131.96230,N,7121.29092,W,2,34,0.8,0.02,M,NA,NA,5,0131*0B 3.055,4131.96231,N,7121.29090,W,2,34,0.8,0.02,M,NA,NA,6,0131*0A 4.055,4131.96230,N,7121.29087,W,2,33,0.8,0.02,M,NA,NA,7,0131*0C 5.055,4131.96231,N,7121.29085,W,2,33,0.8,0.03,M,NA,NA,7,0131*0F 6.055,4131.96231,N,7121.29084,W,2,33,0.8,0.03,M,NA,NA,7,0131*0D 7.055,4131.96231,N,7121.29081,W,2,33,0.8,0.03,M,NA,NA,4,0131*0A 8.055,4131.96231,N,7121.29079,W,2,33,0.8,0.03,M,NA,NA,5,0131*03 9.055,4131.96231,N,7121.29077,W,2,33,0.8,0.03,M,NA,NA,6,0131*0F

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Data Import and Analysis en Help Center y File Exchange.

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by