parfor (file reading)

9 visualizaciones (últimos 30 días)
AP
AP el 10 de Nov. de 2011
Hi all,
I am trying to use parfor in order to speed up the reading of 1000 ascii files. Each file is in the following format:
  • 10 lines describing the data and is the header of the file.
  • the rest of the lines are in the format '%f %f %f %f' containing the values of x, y, z1, z2 variables. The number of these data are up to 10000.
x and y represents the rectangular domain in which z1 and z2 has been measured. Therefore, the domain remains the same among 1000 files. I want to use parfor and store one vector 10000×1 for x, one vector 10000×1 for y, one array 10000×1000 for z1 and one array 10000×1000 for z2.
I used the following pseudocode:
parfor i=1:1000
fid=fopen(fname,'r')
data=textscan(fid,'%f %f %f %f','HeaderLines',10);
x=data{1}
y=data{2}
z1(:,i)=data{3}
z2(:,i)=data{4}
end
I get the error "The variable z1 in a parfor cannot be classified". The error may arise from the indices which are restricted in parfor loop.
Is there a better way for reading these 1000 files in parallel?
Thanks.
  1 comentario
Edric Ellis
Edric Ellis el 10 de Nov. de 2011
That code should work - in your real code, are you using 'z1' in some other way within the loop?

Iniciar sesión para comentar.

Respuestas (1)

Daniel Shub
Daniel Shub el 10 de Nov. de 2011
I am not sure how exactly MATLAB handles file reading and how hard drives handle multiple read request, but my guess is that distributing a job that is IO limited across multiple processors will not speed it up.
  1 comentario
Walter Roberson
Walter Roberson el 10 de Nov. de 2011
Surprisingly, you can get better performance with parallel reads -- at least if you are using SCSI drives with ENQ (enqueue) turned on which allows the drive to re-order read requests according to which destination is "closest" to where it currently is. In common situations, the performance increases up to four parallel reads; in some data access patterns, the performance can continue to climb beyond four parallel reads, but the performance improvement past 4 is not wonderful (but if you have terabytes to get through, you'll take whatever performance increase you can get.)
It also helps if the file you are reading is not compressed and you use scatter/gather I/O.
I do not have any information on drive queue management in the newer PC drives.

Iniciar sesión para comentar.

Categorías

Más información sobre Parallel for-Loops (parfor) en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by