matlab.mapreduce.DeploySparkMapReducer Class

Namespace: matlab.mapreduce
Superclasses:

Configure a MATLAB tall array application with Spark parameters as key-value pairs

Description

A DeploySparkMapReducer object stores the configuration parameters of the tall array application being deployed to Spark™. Every tall array application must be configured prior to being deployed on a Spark cluster. Some of the configuration parameters define properties of the application and some are used by Spark to allocate resources on the cluster. The configuration parameters are passed onto a Spark cluster through a mapreducer function.

Construction

conf = matlab.mapreduce.DeploySparkMapReducer('AppName',name,'Master',url,'SparkProperties',prop) creates a DeploySparkMapReducer object with the specified configuration parameters.

conf = matlab.mapreduce.DeploySparkMapReducer('AppName',name,'Master',url,'SparkProperties',prop,Name,Value) creates a DeploySparkMapReducer object with additional configuration parameters specified by one or more Name,Value pair arguments. Name is a property name of the class and Value is the corresponding value. Name must appear inside single quotes (''). You can specify several name-value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Input Arguments

expand all

`name` — Name of the MATLAB^® application deployed to Spark
character vector | string

Name of application specified as a character vector inside single quotes ('').

Example: 'AppName', 'myApp'

Data Types: char | string

`url` — Master URL to connect to
character vector | string

Name of the master URL specified as a character vector inside single quotes ('').

URL	Description
`yarn-client`	Connect to a Hadoop^® YARN cluster in client mode. The cluster location is found based on the `HADOOP_CONF_DIR` or `YARN_CONF_DIR` variable.

Example: 'Master', 'yarn-client'

Data Types: char | string

`prop` — Map of key-value pairs that specify Spark configuration properties
`containers.Map` object

A containers.Map object containing Spark configuration properties as key-value pairs.

When deploying to a Hadoop YARN cluster, set the value for prop with the appropriate Spark configuration properties as key-value pairs. The precise set of Spark configuration properties vary from one deployment scenario to another, based on the deployment cluster environment. Users must verify the Spark setup with a system administrator to use the appropriate configuration properties. See the table for commonly used Spark properties. For a full set of properties, see the latest Spark documentation.

Running Spark on YARN

Property Name (Key)	Default (Value)	Description
`spark.executor.cores`	`1`	The number of cores to use on each executor. For YARN and Spark standalone mode only. In Spark standalone mode, setting this parameter allows an application to run multiple executors on the same worker, provided that there are enough cores on that worker. Otherwise, only one executor per application runs on each worker.
`spark.executor.instances`	`2`	The number of executors. Note This property is incompatible with `spark.dynamicAllocation.enabled`. If both `spark.dynamicAllocation.enabled` and `spark.executor.instances` are specified, dynamic allocation is turned off and the specified number of `spark.executor.instances` is used.
`spark.driver.memory`	`1g` `2048m` (recommended)	Amount of memory to use for the driver process. If you get any out of memory errors while using `tall/gather`, consider increasing this value.
`spark.executor.memory`	`1g` `2048m` (recommended)	Amount of memory to use per executor process. If you get any out of memory errors while using `tall/gather`, consider increasing this value.
`spark.yarn.executor.memoryOverhead`	`executorMemory * 0.10`, with minimum of `384`. `4096m` (recommended)	The amount of off-heap memory (in MBs) to be allocated per executor. If you get any out of memory errors while using `tall/gather`, consider increasing this value.
`spark.dynamicAllocation.enabled`	`false`	This option integrates Spark with the YARN resource management. Spark initiates as many executors as possible given the executor memory requirement and number of cores. This property requires that the cluster be set up. Setting this property to `true` specifies whether to use dynamic resource allocation, which scales the number of executors registered with this application up and down based on the workload. This property requires `spark.shuffle.service.enabled` to be set. The following configurations are also relevant: `spark.dynamicAllocation.minExecutors`, `spark.dynamicAllocation.maxExecutors`, and `spark.dynamicAllocation.initialExecutors`
`spark.shuffle.service.enabled`	`false`	Enables the external shuffle service. This service preserves the shuffle files written by executors so the executors can be safely removed. This must be enabled if `spark.dynamicAllocation.enabled` is set to `true`. The external shuffle service must be set up in order to enable it.

MATLAB Specific Properties

Property Name (Key)	Default (Value)	Description
`spark.matlab.worker.debug`	`false`	For use in standalone/interactive mode only. If set to true, a Spark deployable MATLAB application executed within the MATLAB desktop environment, starts another MATLAB session as worker, and will enter the debugger. Logging information is directed to `log_<nbr>.txt`.
`spark.matlab.worker.reuse`	`true`	When set to `true`, a Spark executor pools workers and reuses them from one stage to the next. Workers terminate when the executor under which the workers are running terminates.
`spark.matlab.worker.profile`	`false`	Only valid when using a session of MATLAB as a worker. When set to `true`, it turns on the MATLAB Profiler and generates a Profile report that is saved to the file `profworker_<split_index>_<socket>_<worker pass>.mat`.
`spark.matlab.worker.numberOfKeys`	`10000`	Number of unique keys that can be held in a `containers.Map` object while performing `*ByKey` operations before map data is spilled to a file.
`spark.matlab.executor.timeout`	`600000`	Spark executor timeout in milliseconds. Not applicable when deploying tall arrays.

Monitoring and Logging

Property Name (Key)	Default (Value)	Description
`spark.history.fs.logDirectory`	`file:/tmp/spark-events`	Directory that contains application event logs to be loaded by the history server.
`spark.eventLog.dir`	`file:///tmp/spark-events`	Base directory in which Spark events are logged, if `spark.eventLog.enabled` is `true`. Within this base directory, Spark creates a sub directory for each application, and logs the events specific to the application in this directory. You can set this to a unified location like an HDFS™ directory so history files can be read by the history server.
`spark.eventLog.enabled`	`false`	Whether to log Spark events. This is useful for reconstructing the web UI after the application has finished.

Name-Value Arguments

expand all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

`MCRRoot` — Path to MATLAB Runtime that is used to execute driver application
character vector | string

A character vector specifying the path to MATLAB Runtime within single quotes ''.

Example: 'MCRRoot', '/share/MATLAB/MATLAB_Runtime/R2025a'

Data Types: char | string

`SparkLogLevel` — Set the Spark log level
`'ALL'` | `'DEBUG'` | `'ERROR'` | `'FATAL'` | `'INFO'` | `'OFF'` | `'TRACE'` | `'WARN'`

Specify the log level to set as a character vector with log level enclosed in ''.

Data Types: char | string

Properties

The properties of this class are hidden.

Methods

There are no user executable methods for this class.

Examples

collapse all

Create DeploySparkMapReducer Object

Define Spark properties and create a DeploySparkMapReducer object.

sparkProperties = containers.Map( ...
 {'spark.executor.cores', ...
 'spark.executor.memory', ...
 'spark.yarn.executor.memoryOverhead', ...
 'spark.dynamicAllocation.enabled', ...
 'spark.shuffle.service.enabled', ...
 'spark.eventLog.enabled', ...
 'spark.eventLog.dir'}, ...
 {'1', ...
  '2g', ...
  '1024', ...
  'true', ...
  'true', ...
  'true', ...
  'hdfs://hadoopfs:54310/user/<username>/sparkdeploy'});

conf = matlab.mapreduce.DeploySparkMapReducer( ...
      'AppName','myTallApp', ...
      'Master','yarn-client', ...
      'SparkProperties',sparkProperties);

mapreducer(conf);

Version History

Introduced in R2016b

matlab.mapreduce.DeploySparkMapReducer Class

Description

Construction

Input Arguments

`name` — Name of the MATLAB^® application deployed to Spark
character vector | string

`url` — Master URL to connect to
character vector | string

`prop` — Map of key-value pairs that specify Spark configuration properties
`containers.Map` object

Name-Value Arguments

`MCRRoot` — Path to MATLAB Runtime that is used to execute driver application
character vector | string

`SparkLogLevel` — Set the Spark log level
`'ALL'` | `'DEBUG'` | `'ERROR'` | `'FATAL'` | `'INFO'` | `'OFF'` | `'TRACE'` | `'WARN'`

Properties

Methods

Examples

Create DeploySparkMapReducer Object

Version History

See Also

Topics

matlab.mapreduce.DeploySparkMapReducer Class

Description

Construction

Input Arguments

name — Name of the MATLAB® application deployed to Spark character vector | string

url — Master URL to connect to character vector | string

prop — Map of key-value pairs that specify Spark configuration properties containers.Map object

Name-Value Arguments

MCRRoot — Path to MATLAB Runtime that is used to execute driver application character vector | string

SparkLogLevel — Set the Spark log level 'ALL' | 'DEBUG' | 'ERROR' | 'FATAL' | 'INFO' | 'OFF' | 'TRACE' | 'WARN'

Properties

Methods

Examples

Create DeploySparkMapReducer Object

Version History

See Also

Topics

`name` — Name of the MATLAB^® application deployed to Spark
character vector | string

`url` — Master URL to connect to
character vector | string

`prop` — Map of key-value pairs that specify Spark configuration properties
`containers.Map` object

`MCRRoot` — Path to MATLAB Runtime that is used to execute driver application
character vector | string

`SparkLogLevel` — Set the Spark log level
`'ALL'` | `'DEBUG'` | `'ERROR'` | `'FATAL'` | `'INFO'` | `'OFF'` | `'TRACE'` | `'WARN'`