Mapreduce on parallel cluster - Database or disk full - How control storage of intermediate files?

Question

Christian el 9 de Jul. de 2019

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/470861-mapreduce-on-parallel-cluster-database-or-disk-full-how-control-storage-of-intermediate-files

Respondida: Christian el 10 de Jul. de 2019

I wish to calculate several statistics (Spectra, Correlation Functions, etc.) of ~400 files with 6e6 doubles per file and afterwards average over all files to get average spectra, correlation functions, etc. To make things fast, I try to use mapreduce on a parallel cluster. This works like a charm as long as there are relatively few files (~100), but with a larger amount of files I get this error message:

Error using parallel.mapreduce.KeyValueOutputStore/addmulti (line 63)
Error in adding keys and values.
Error in Analysis20190708>Analysis (line 115)
    addmulti(intermKVStore, {'StatNames'}, {Stats});
Error in parallel.internal.pool.deserialize>@(data,info,intermKVStore)Analysis(data,Parameters,info,intermKVStore)
Error in mapreduce (line 116)
    outds = execMapReduce(mrcer, ds, mapfun, reducefun, parsedStruct);
Error in Analysis20190708 (line 72)
outDS = mapreduce(ds, mapper, @reduceAnalysis,inpool);
Caused by:
    The database /tmp/filename/TaskOutput7.db is full. (database or disk is full)

The message occurs after around 50% of the map phase is done, but later when I reduce the size of the result vectors (less frequently sampled spectra for example). I checked with an admin and the /tmp indeed has very limited free space.

The question is now: How do I tell MATLAB to store these intermediate(?) files to a different location with more storage

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

Christian el 10 de Jul. de 2019

1
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/470861-mapreduce-on-parallel-cluster-database-or-disk-full-how-control-storage-of-intermediate-files#answer_382660

Abrir en MATLAB Online

Figured it out:

The data is stored in the standard directory for temporary files given by

tempdir

Setting the corresponding environment variable to a directory with higher capacity solves the issue, e.g.:

setenv('TMP', 'LargerDirectory')

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

Mapreduce on parallel cluster - Database or disk full - How control storage of intermediate files?

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

Mapreduce on parallel cluster - Database or disk full - How control storage of intermediate files?

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Productos

Versión

Community Treasure Hunt

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos