MATLAB Answers

Out of memory using textscan - fail to read in part a part at a time

3 views (last 30 days)
Jesper Kamp Jensen
Jesper Kamp Jensen on 6 Nov 2015
Commented: Walter Roberson on 8 Nov 2015
I hope some one can be helpful as I have been struggling with this problem for a while now.
I have huge datafiles (3-4 GB), that I want to read in and save part of in new and smaller files.
When I run the code below it will eventually run out of memory, but for each run in the while loop the matrices G1 and MAT will be reassigned? Or am I missing something obvious? Another problem is that, the files that I manage to save before the computer runs out of memory, is not separated by N - it is only shifted 4 lines so 4 new lines are added in each new saved file.
I really hope someone can be helpful! Thanks for your time.
Best regards Jesper
while ~feof(fid)
G1 = textscan(fid,formatSpec,N,'HeaderLines','8','Delimiter',' '); % Read in one block at the time at saving it temporarily
MAT=cell2mat(G1(:,[6 8 10 11 13 15 17 18 19 20 21 22 23 24 25])); % Collection in the rigth structure
k=k+1 % Counting for saving with new name each time
sti1 = ['C:\Data\Youngsund\Dat files\' name '-' k_str '_new.dat'];

Answers (2)

Walter Roberson
Walter Roberson on 6 Nov 2015
Yes, all of your variables except k will get overwritten each time through the while loop. You could be even more explicit about that by adding a "clear" statement.
If only 4 lines at a time are getting added then the implication is that your formatSpec fails to match the input either some time in the 4th line or at the beginning of the 5th line.
It is not obvious to me why you would be running out of memory, but one thing I would suggest is that you replace the dlmwrite() with a few lines of code that output the way you want. For example before the loop,
outfmt = repmat('%g,', 1,15);
outfmt(end:end+1) = '\n';
Then replace the dlmwrite with
outfid = fopen(sti1, 'wt');
fprintf(outfid, outfmt, MAT.'); %you need the .' that is there!

Jesper Kamp Jensen
Jesper Kamp Jensen on 6 Nov 2015
Thanks for your answer. When applying the changes you suggest (only for a few runs) it gives good results - so I must see whether it can run all the way through the file without getting out of memory.
However, something must be wrong with formatSpec - I've tried to set N=4 and for k=1 I get four lines 1, 2, 3 and 4, but for the following:
  • k=2, lines: 5, 6, 7 and 8
  • k=3, lines 8, 9, 10 and 11
  • k=4, lines 8, 9, 10 and 11
  • k=5, lines 8, 9, 10 and 11
  • k=6, lines 9, 10, 11 and 12
  • k=7, lines 10, 11, 12 and 13
  • k=8, lines 11, 12, 13 and 14
  • k=9, lines 11, 12, 13 and 14
  • k=10, lines 12, 13, 14 and 15
Is it only the formatSpec that is causing that?
formatSpec='%4s %f %f %f %f %f %c %f %c %f %f %c %f %c %f %c %f %f %f %f %f %f %f %f %f %*[^\n]';
  1 Comment
Walter Roberson
Walter Roberson on 8 Nov 2015
I do not understand what the chart of k and line numbers is intended to indicate?
Could you post about 9 lines of input?
My guess is that you have a problem with blanks and using %s and %c.
Note: you can simplify your processing by using
formatSpec = '%*4s %*f %*f %*f %*f %f %*c %f %*c %f %f %*c %f %*c %f %*c %f %f %f %f %f %f %f %f %f %*[^\n]'
G1 = textscan( fid, formatSpec, N, 'HeaderLines', '8', 'Delimiter',' ', 'CollectOutput', 1);
MAT = G1{1};

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by