Main Content

When to Run Statistical Functions in Parallel

Why Run in Parallel?

The main reason to run statistical computations in parallel is to gain speed, meaning to reduce the execution time of your program or functions. Factors Affecting Speed discusses the main items affecting the speed of programs or functions. Factors Affecting Results discusses details that can cause a parallel run to give different results than a serial run.

Note

Some Statistics and Machine Learning Toolbox™ functions have built-in parallel computing capabilities. See Quick Start Parallel Computing for Statistics and Machine Learning Toolbox. You can also use any Statistics and Machine Learning Toolbox functions with Parallel Computing Toolbox™ functions such as parfor loops. To decide when to call functions in parallel, consider the factors affecting speed and results.

Factors Affecting Speed

Some factors that can affect the speed of execution of parallel processing are:

  • Parallel environment setup. It takes time to run parpool to begin computing in parallel. If your computation is fast, the setup time can exceed any time saved by computing in parallel.

  • Parallel overhead. There is overhead in communication and coordination when running in parallel. If function evaluations are fast, this overhead could be an appreciable part of the total computation time. Thus, solving a problem in parallel can be slower than solving the problem serially. For an example, see Improving Optimization Performance with Parallel Computing in MATLAB® Digest, March 2009.

  • No nested parfor loops. This is described in Working with parfor. parfor does not work in parallel when called from within another parfor loop. If you have programmed your custom functions to take advantage of parallel processing, the limitation of no nested parfor loops can cause a parallel function to run slower than expected.

  • When executing serially, parfor loops run slightly slower than for loops.

  • Passing parameters. Parameters are automatically passed to worker sessions during the execution of parallel computations. If there are many parameters, or they take a large amount of memory, passing parameters can slow the execution of your computation.

  • Contention for resources: network and computing. If the pool of workers has low bandwidth or high latency, parallel computation can be slow.

Factors Affecting Results

Some factors can affect results when using parallel processing. You might need to adjust your code to run in parallel, for example, you need independent loops and the workers must be able to access the variables. Some important factors are:

  • Persistent or global variables. If any functions use persistent or global variables, these variables can take different values on different worker processors. The body of a parfor loop cannot contain global or persistent variable declarations.

  • Accessing external files. The order of computations is not guaranteed during parallel processing, so external files can be accessed in unpredictable order, leading to unpredictable results. Furthermore, if multiple processors try to read an external file simultaneously, the file can become locked, leading to a read error, and halting function execution.

  • Noncomputational functions, such as input, plot, and keyboard, can behave badly when used in your custom functions. Do not use these functions in a parfor loop, because they can cause a worker to become nonresponsive, since it is waiting for input.

  • parfor does not allow break or return statements.

  • The random numbers you use can affect the results of your computations. See Reproducibility in Parallel Statistical Computations.

For advice on converting for loops to use parfor, see Parallel for-Loops (parfor) (Parallel Computing Toolbox).