open text file constructed in a specified manner

3 views (last 30 days)
Rica
Rica on 18 Mar 2015
Edited: dpb on 19 Mar 2015
Hi all,
I have a big text file in this construction:
blablblala
blablabla
1 2 3
4 5 6
2 5 8
blablblala
blablabla
5 8 9
5 2 3
2 5 6
blablblala
blablabla
2 5 8
98 56 42
5 8 7
blablblala
blablabla
...
...
...
blablblala
blablabla
...
...
...
blablblala
blablabla
.
.
.
the structure is the same , only the numbers do change. ho could open this kind of text file and get the data in a cell:
C{1}=
1 2 3
4 5 6
2 5 8
C{2}=
5 8 9
5 2 3
2 5 6
C{3}=...
and so on.
Thank you.
  1 Comment
per isakson
per isakson on 18 Mar 2015
The strings
blablblala
blablabla
do they contain substrings, which can be used as indicators of
START OF NUMERICAL DATA
and
END OF NUMERICAL DATA

Sign in to comment.

Answers (2)


per isakson
per isakson on 18 Mar 2015
Edited: per isakson on 18 Mar 2015
  4 Comments
dpb
dpb on 18 Mar 2015
This one is almost trivial, Per...altho one thing I dislike about textscan is that it won't figure out the number/line automagically so if you don't know that you have to do some work to discern that for the format string. But, if it is known a priori, you can do this one by inspection on the fly in one try...
>> type blabla.txt
blablblala
blablabla
1 2 3
4 5 6
2 5 8
blablblala
blablabla
5 8 9
5 2 3
2 5 6
blablblala
blablabla
2 5 8
98 56 42
5 8 7
>> fid=fopen('blabla.txt');
>> fgetl(fid);
>> while ~feof(fid)
cell2mat(textscan(fid,'%f%f%f','headerlines',2', ...
'collectoutput',1))
end
ans =
1 2 3
4 5 6
2 5 8
ans =
5 8 9
5 2 3
2 5 6
ans =
2 5 8
98 56 42
5 8 7
>> fid=fclose(fid);
>>
The only "trick" used here is fgetl to throw away the '%%' line. You would think that 'commentstyle','%' would take care of that, but no; the 'headerlines',2 option takes over first and "eats" those first two lines so then there are no comment lines in the file after that. I think that's a debatable implementation decision, but it's hard to argue it's really a bug.
@Per--I think you're overthinking textscan. For files like the above simply look for the regularity and what it takes to
  1. get past any unique header information first, and then
  2. write a format expression (or group thereof) that handles the repetitive block as if it were the only one in the file and finally,
  3. repeat that block until done.
In the above he '%f' field will repeat until it fails; using three simply causes it to return the result as an array of three columns instead of as a vector; the file pointer will be at the beginning of the next header line where the failure to convert to a numeric value occurred so then the repeat of skipping the two headerlines picks up and "lather, rinse and repeat"...

Sign in to comment.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by