Problems in reading large matrix with unintended gaps between data

4 visualizaciones (últimos 30 días)
Poulomi Ganguli
Poulomi Ganguli el 16 de Mayo de 2021
Comentada: Rik el 18 de Mayo de 2021
Hello:
I guess I have attached the output file instead of the actual one. I revised my submission here. The problem is with the last column DRNRF. There is a gap between two digits '02' '00', '03' '25' and so on. This two numerals should be merged together while reading giving the output as '0200' and '0325' respectively. The rest of the procedure I am following from this post, which is working: https://de.mathworks.com/matlabcentral/answers/823485-problems-in-reading-large-matrix-with-large-empty-cells
Any help in this regard would be appreciated. Thanks!
  1 comentario
Mathieu NOE
Mathieu NOE el 17 de Mayo de 2021
well
I don't see a gap , but a dot between the two fields - or am I missing something ?
INDEX YEAR MN DT ..MAX ..MIN AW ..R/F .EVP DRNRF
-------------------------------------------------
42045 1985 01 01 2.5 -3.9 002.4 02.00
42045 1985 01 02 6.5 -2.9 003.4 03.25
42045 1985 01 03 5.0 -3.9 007.6 02.35
42045 1985 01 04 6.0 -1.1 000.0 00.00
42045 1985 01 05 5.2 -1.1 009.2 08.15
42045 1985 01 06 1.8 -4.1 009.5 11.20
when I import your file , the last column appears as :
outdata(:,end)
ans =
2.0000
3.2500
2.3500
0
8.1500
11.2000

Iniciar sesión para comentar.

Respuestas (1)

Mathieu NOE
Mathieu NOE el 18 de Mayo de 2021
hello again
so this is my simple fix , retrieve the two last vectors and do this very simple mathematical fix ;
your DRNRF data appears in (updated) outdata(:,8)
[outdata,head] = readclm('Test_data.txt',9);
outdata(:,8) = outdata(:,8)*100+outdata(:,9);
outdata(:,9) = []; % clean up
I like the readclm subfuntion , here is it (maybe the textscan afficionados will have a comment about my coding methods ...) :
function [outdata,head] = readclm(filename,nclm,skip,formt)
% READCLM Reads numerical data from a text file into a matrix.
% Text file can begin with a header or comment block.
% [DATA,HEAD] = READCLM(FILENAME,NCLM,SKIP,FORMAT)
% Opens file FILENAME, skips first several lines specified
% by SKIP number or beginning with comment '%'.
% Then reads next several lines into a string matrix HEAD
% until the first line with numerical data is encountered
% (that is until first non-empty output of SSCANF).
% Then reads the rest of the file into a numerical matrix
% DATA in a format FORMAT with number of columns equal
% to number of columns of the text file or specified by
% number NCLM. If data does not match the size of the
% matrix DATA, it is padded with NaN at the end.
%
% READCLM(FILENAME) reads data from a text file FILENAME,
% skipping only commented lines. It determines number of
% columns by the length of the first data line and uses
% the floating point format '%g';
%
% READCLM uses FGETS to read the first lines and FSCANF
% for reading data.
% Kirill K. Pankratov, kirill@plume.mit.edu
% 03/12/94, 01/10/95.
% Defaults and parameters ..............................
formt_dflt = '%g'; % Default format for fscanf
addn = nan; % Number to fill the end if necessary
% Handle input ..........................................
if nargin<1, error(' File name is undefined'); end
if nargin<4, formt = formt_dflt; end
if nargin<3, skip = 0; end
if nargin<2, nclm = 0; end
if isempty(nclm), nclm = 0; end
if isempty(skip), skip = 0; end
% Open file ............................
[fid,msg] = fopen(filename);
if fid<0, disp(msg), return, end
% Find header and first data line ......................
is_head = 1;
jl = 0;
head = ' ';
while is_head % Add lines to header.....
s = fgets(fid); % Get next line
jl = jl+1;
is_skip = jl<=skip;
is_skip = jl<=skip | s(1)=='%';
out1 = sscanf(s,formt); % Try to read this line
% If unreadable by SSCANF or skip, add to header
is_head = isempty(out1) | is_skip;
if is_head & ~is_skip
head = str2mat(head,s(1:length(s)-1)); end
end
head = head(2:size(head,1),:);
% Determine number of columns if not specified
out1 = out1(:)';
l1 = length(out1);
if ~nclm, nclm = l1; end
% Read the rest of the file ..............................
if l1~=nclm % First line format is different from ncolumns
outdata = fscanf(fid,formt);
lout = length(outdata)+l1;
ncu = ceil(lout/nclm);
lz = nclm*ncu-lout;
outdata = [out1'; outdata(:); ones(lz,1)*addn];
outdata = reshape(outdata,nclm,ncu)';
else % Regular case
outdata = fscanf(fid,formt,[nclm inf]);
outdata = [out1; outdata']; % Add the first line
end
fclose (fid); % Close file ..........
  3 comentarios
Mathieu NOE
Mathieu NOE el 18 de Mayo de 2021
hello
that is way back more than 20 years now - for sure it was in the FEX section, but it has probably been deleted since then
this function was part of a package (see attached) - I don't know if this is still usefull today as recent releases have much more txt file processing possibilites as before Y2000 era;
Rik
Rik el 18 de Mayo de 2021
That does make sense. Sometimes FEX submissions are deleted.
To read text files I have written the readfile function (which you can get from the FEX or through the AddOn-manager). It works on every release that supports && and || (i.e. R13 (v6.5) and later). Aparently it was a good idea to have such a function (including the ability to read from a URL), as Mathworks implemented such a function in R2020b (readlines).
I prefer to split tasks into different functions, so I would read the entire file as text, then outside that function remove the lines with header or comments and use textscan or regexp or str2double, whatever makes sense.

Iniciar sesión para comentar.

Categorías

Más información sobre String Parsing en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by