Main Content

gather

Collect tall array into memory after executing queued operations

Description

Y = gather(X) executes all queued operations required to calculate unevaluated tall array X and collects the results into memory as Y.

MATLAB® can run out of memory if the result of the gather calculation is too large. If you are unsure whether the result can fit in memory, use gather(head(X)) or gather(tail(X)) to perform the full calculation, but bring only a small portion of the result into memory.

Use gather sparingly to ensure that extra passes through the data are combined during the calculations whenever possible. For more information, see Lazy Evaluation of Tall Arrays.

example

[Y1,Y2,Y3,...] = gather(X1,X2,X3,...) gathers multiple unevaluated tall arrays X1, X2, X3,... into the corresponding outputs Y1, Y2, Y3,....

example

Examples

collapse all

Create a datastore for the airlinesmall.csv data set. Select a subset of variables to work with, and treat 'NA' values as missing data so that tabularTextDatastore replaces them with NaN values. Convert the datastore into a tall table.

varnames = {'Year','ArrDelay','UniqueCarrier'};
ds = tabularTextDatastore('airlinesmall.csv', 'TreatAsMissing', 'NA',...
    'SelectedVariableNames',varnames);
T = tall(ds)
T =

  Mx3 tall table

    Year    ArrDelay    UniqueCarrier
    ____    ________    _____________

    1987        8          {'PS'}    
    1987        8          {'PS'}    
    1987       21          {'PS'}    
    1987       13          {'PS'}    
    1987        4          {'PS'}    
    1987       59          {'PS'}    
    1987        3          {'PS'}    
    1987       11          {'PS'}    
     :         :              :
     :         :              :

Calculate the size of the tall table.

sz = size(T)
sz =

  1x2 tall double row vector

    ?    ?

MATLAB® does not immediately evaluate most operations on tall arrays. Instead, MATLAB remembers the operations you perform as you enter them and optimizes the calculations in the background.

When you use gather on an unevaluated tall array, MATLAB executes all of the queued operations using the minimum number of passes through the data. This optimization greatly reduces the execution time of large calculations. For this reason, you should use gather only when you need to see a result.

Use gather to execute the calculation and collect the result into memory.

S = gather(sz)
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 1: Completed in 0.47 sec
Evaluation completed in 0.61 sec
S = 1×2

      123523           3

Use gather with several inputs to simultaneously evaluate several tall arrays.

Create a tall array from an in-memory array of random integers between 1 and 1000. Calculate the maximum and minimum values in each column.

A = tall(randi(1000,100,7))
A =

  100x7 tall double matrix

   815   163   645    60   423   583   851
   906   795   379   682    95   541   561
   127   312   812    43   599   870   930
   914   529   533    72   471   265   697
   633   166   351   522   696   319   583
    98   602   940    97   700   120   816
   279   263   876   819   639   940   880
   547   655   551   818    34   646   989
    :     :     :     :     :     :     :
    :     :     :     :     :     :     :
b = min(A);
c = max(A);

Use the results to determine the overall minimum and maximum values in the array. Collect the final result into memory.

[mnA,mxA] = gather(min(b),max(c));
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 1: Completed in 0.2 sec
Evaluation completed in 0.45 sec
valRange = [mnA mxA]
valRange = 1×2

           1        1000

Input Arguments

collapse all

Unevaluated tall array. An unevaluated tall array is any tall array on which you perform calculations without using gather to fully evaluate those calculations.

Output Arguments

collapse all

In-memory array. The data type of Y is the same as the underlying data type of the unevaluated tall array X.

Tips

  • Functions that return multiple output arguments must use variables to provide all of the outputs to gather. For example,

    [a,b] = bounds(X);
    [a,b] = gather(a,b);

  • If you have Parallel Computing Toolbox™, see gather (Parallel Computing Toolbox) for information about gathering distributed and gpuArray computations.

Extended Capabilities

Tall Arrays
Calculate with arrays that have more rows than fit in memory.

GPU Code Generation
Generate CUDA® code for NVIDIA® GPUs using GPU Coder™.

Thread-Based Environment
Run code in the background using MATLAB® backgroundPool or accelerate code with Parallel Computing Toolbox™ ThreadPool.

Version History

Introduced in R2016b