# Improving Simulation Performance in Simulink

By Seth Popinchalk, MathWorks

Whatever the level of complexity of the model, every Simulink® user wants to improve simulation performance. This article presents tips and techniques to help you make the most of your memory and processing power. It covers the following topics:

While every situation is unique, the techniques described here can be applied to a wide range of projects. Use them as a list of things to try whenever you need to increase the speed of your simulations.

## Selecting a Simulation Mode

In Simulink, three modes affect simulation performance: Normal, Accelerator, and Rapid Accelerator (Figure 1). As their names imply, Accelerator is faster than Normal, and Rapid Accelerator is faster still. Each increase in speed typically means sacrificing another capability—for example, flexibility, interactivity, or diagnostics. In many cases, if you can work without one of these capabilities—at least temporarily—simulation performance will improve.

In Normal mode, Simulink interprets your model during each simulation run. If you change your model frequently, this is generally the preferred mode to use because it requires no separate compilation step.

In Accelerator mode, Simulink compiles a model into a binary shared library or DLL where possible, eliminating the block-to-block overhead of an interpreted simulation in Normal Mode. Accelerator mode supports the debugger and profiler, but not runtime diagnostics.

In Rapid Accelerator mode, Simulink compiles a standalone executable for the model, which can run on a separate processing core. You can use Rapid Accelerator mode only when the full model is capable of generating code, and this mode restricts interaction with the model during simulations. For example, Rapid Accelerator mode does not support debugging. As with Accelerator mode, it's best to use Rapid Accelerator mode when your simulations take much more time than the one-time cost of compilation.

You can trade off compile-time speed for simulation speed by setting the compiler optimization level. Compiler optimizations for accelerations are disabled by default. Enabling them (Figure 2) will accelerate simulation runs but result in longer build times. The speed and efficiency of the C compiler used for Accelerator and Rapid Accelerator modes also affects the time required in the compile step.

Figure 2. Compiler optimizations.

### When Accelerator Modes Might Not Substantially Improve Performance

Accelerator or Rapid Accelerator mode may not improve performance much in the following situations:

Your model's algorithm is primarily contained in a few complex blocks, such as the Fast Fourier Transform block or lookup tables. A small model may run slower in an accelerator mode because native blocks are highly optimized. In contrast, a model with many basic blocks is more likely to benefit from compilation.

Your model contains compiled code, such as code from S-functions, Stateflow® blocks, and MATLAB® functions. Using the compilation step will not speed the model further.

Your model contains blocks that cannot be compiled, such as Interpreted MATLAB Function blocks.

Your simulation runs include initialization or termination phases. Because the accelerator modes work only on the simulation phase of each run, they may not offer much improvement if they require time-consuming initialization or termination phases. For details, see the Accelerating the Initialization Phase section of this article.

Your system lacks sufficient memory. Memory issues can slow simulations, particularly for large models running on 32-bit operating systems. You can monitor Simulink memory usage with the sldiagnostics function. If you use a 32-bit Windows system, consider using the Windows 3GB startup switch or switching to a 64-bit system.

You log large data sets. When logging large amounts of data (for example, in models that include the Workspace I/O, To Workspace, To File, or Scope blocks), use decimation or limit the logged output to the last part of the simulation. Avoid logging redundant data (for example, log the time only once) and extraneous data (for example, log integer values instead of doubles when feasible). To File blocks save data incrementally in an array format, so use them for logging tasks when managing memory usage for long simulations is a priority.

## Identifying Simulation Bottlenecks

If the acceleration modes do not provide the simulation speed you require, use the sldiagnostics function, Simulink Profiler, and Simulink Model Advisor to identify and eliminate simulation bottlenecks. The sldiagnostics function examines your Simulink model without running it and displays diagnostic information, including how many instances of each block are in your model and how much time and memory compilation requires.

As you review the results produced by sldiagnostics, note how many interpreted MATLAB functions your model uses (Figure 3). Because data exchange between MATLAB and Simulink passes through several software layers, Interpreted MATLAB Function blocks usually slow simulations, particularly if the model needs many data exchanges. Additionally, because interpreted MATLAB functions cannot be compiled, Interpreted MATLAB Function blocks impede attempts to use an acceleration mode to speed up simulations.

Figure 3. Sample output of sldiagnostics. The model uses three MATLAB Function blocks.

Simulink Profiler lets you quantify exactly how much time each phase of your simulation takes and how much time each block takes to simulate (Figure 4). This procedure can generate a great deal of data. To minimize the amount of data that you need to review, focus on those methods that consume the most time and those that are most frequently called.

Figure 4. A sample Simulink Profiler report.

### Using MATLAB Functions Instead of Interpreted MATLAB Function Blocks

To call a MATLAB function within your Simulink model, use a MATLAB Function block instead of an Interpreted MATLAB Function block or a MATLAB S-function. (The MATLAB Function block was previously called the Embedded MATLAB block, and the Interpreted MATLAB Function block was previously called the MATLAB Function block.) The MATLAB Function is the faster alternative. It supports the generation of embeddable C code, and does not incur the data packaging overhead required by the Interpreted MATLAB Function block. While the MATLAB Function block does not support all MATLAB functions, the subset of the MATLAB language that it does support is extensive. By replacing your interpreted MATLAB code with code that uses only this embeddable MATLAB subset, you can significantly improve performance.

To quickly find all the Interpreted MATLAB Function blocks in your model, open Model Explorer and search your model by Block Type, selecting MATLABFcn as the type.

## Modifying and Simplifying Your Model

Most of the techniques described so far require few, if any, changes to the model itself. You can achieve additional performance improvements by applying techniques that involve modifications to the model.

### Accelerating the Initialization Phase

Large images and complex graphics take a long time to load and render. As a result, masked blocks that contain images might make your model less responsive. To accelerate the initialization phase of a simulation, remove complex drawings and images. If you don't want to remove an image, you can still improve performance by replacing it with a smaller, low-resolution version. To do this, use the Mask Editor and edit the icon drawing commands to change the image that is loaded by the call to image().

When you update or open a model, Simulink runs the mask initialization code. If you have complicated mask initialization commands that contain many calls to set_param, consider consolidating consecutive calls to set_param()into a single call with multiple argument pairs. This can reduce the overhead associated with these calls.

If you use MATLAB scripts to load and initialize data, you can often improve performance by loading MAT-files instead. The drawback is that the data in a MAT-file is not in a human-readable form, and can therefore be more difficult to work with than a script. However, load typically initializes data much more quickly than the equivalent script.

### Reducing Interactivity

In general, the more interactive the model, the longer it will take to simulate. The tips in this section illustrate ways to improve performance by giving up some interactivity.

Enable inline parameters optimization. When you enable this optimization in the Optimization pane of the Configuration Parameters dialog box, Simulink uses the numerical values of model parameters instead of their symbolic names. This substitution can improve performance by reducing the parameter tuning computations performed during simulations.

Disable debugging diagnostics. Some enabled diagnostic features noticeably slow simulations. You can disable them in the Diagnostics pane of the Configuration Parameters dialog box.

Note: Running the array bounds exceeded and solver data inconsistency diagnostics can cause a noticeable slowdown in model run-time performance.

Disable MATLAB debugging and use BLAS library support. After verifying that your MATLAB code works correctly, disable debugging support. In the Simulation Target pane of the Configuration Parameters dialog box, disable debugging/animation, overflow detection, and echoing expressions without semicolons (Figure 5).

Figure 5. The Simulation Target pane of the Configuration Parameters dialog box.

If your simulation involves low-level MATLAB matrix operations, enable the Basic Linear Algebra Subprograms (BLAS) Library feature to make use of highly optimized external linear algebra routines.

Disable Stateflow animations. By default, Stateflow charts highlight the current active states and animate the state transitions that take place as the model runs. This feature is useful for debugging, but it slows the simulation. To accelerate simulations, either close all Stateflow charts or disable the animation. Similarly, if you're using Simulink 3D Animation™, SimMechanics™ visualization, FlightGear, or another 3D animation package, consider disabling the animation or reducing scene fidelity to improve performance.

Adjust viewer-specific parameters and manage viewers through enabled subsystems. If your model contains a scope viewer that displays a large number of data points and you can't eliminate the scope, try adjusting the viewer parameters to trade off fidelity for rendering speed. Be aware, however, that by using decimation to reduce the number of plotted data points, you risk missing short transients and other phenomena that would be obvious with more data points. You can place viewers in enabled subsystems to more precisely control which visualizations are enabled and when.

### Reducing Model Complexity

Simplifying your model without sacrificing fidelity is an effective way to improve simulation performance. Here are three ways to reduce model complexity.

Replace a subsystem with a lower-fidelity alternative. In many cases, you can simplify your model by replacing a complex subsystem model with one of the following:

• A linear or nonlinear dynamic model created from measured input-output data using System Identification Toolbox
• A high-fidelity, nonlinear statistical model created using Model-Based Calibration Toolbox
• A linear model created using Simulink Control Design
• A lookup table

You can maintain both representations of the subsystem in a library and use variant subsystems to manage them.

Reduce the number of blocks. When you reduce the number of blocks in your model, fewer blocks will need to be updated during simulations, leading to faster simulation runs. Vectorization is one way to reduce your block count. For example, if you have several parallel signals that undergo a similar set of computations, try combining them into a vector and performing a single computation. Another way is to simply enable the Block Reduction optimization in the Optimization > General section of the configuration parameters.

Use frame-based processing. In frame-based processing, samples are processed in batches instead of one at a time. If your model includes an analog-to-digital converter, for example, you can collect the output samples in a buffer and process the buffer with a single operation, such as a fast Fourier transform. Processing data in chunks in this way reduces the number of times that blocks in your model must be invoked. In general, scheduling overhead decreases as frame size increases. However, larger frames consume more memory, and memory limitations can adversely affect the performance of complex models. Experiment with different frame sizes to find one that maximizes the performance benefit of frame-based processing without causing memory issues.

### Choosing and Configuring a Solver

Simulink provides a comprehensive library of solvers, including fixed-step and variable-step solvers to handle stiff and nonstiff systems. Each solver determines the time of the next simulation step and applies a numerical method to solve ordinary differential equations that represent the model. The solver you choose and the solver options you specify will affect simulation speed.

Select a solver that matches the stiffness of your system. A stiff system has both slowly and quickly varying continuous dynamics. Implicit solvers are specifically designed for stiff problems, whereas explicit solvers are designed for nonstiff problems. Using nonstiff solvers to solve stiff systems is inefficient and can lead to incorrect results. If a nonstiff solver uses a very small step size to solve your model, it may be because you have a stiff system.

Choose a variable-step or fixed-step solver based on your model's step size and dynamics. Exercise caution when deciding whether to use a variable-step or fixed-step solver; otherwise, your solver could take additional time steps to capture dynamics that are not important to you, or it could perform unnecessary calculations to work out the next time step.

In general, simulations run with variable-step solvers are faster than those run with fixed-step solvers: You use fixed-step solvers when the step size is less than or equal to the fundamental sample time of the model. With a variable-step solver, the step size can vary because variable-step solvers dynamically adjust the step size. As a result, the step size for some time steps is larger than the fundamental sample time, reducing the number of steps required to complete the simulation.

As a rule, choose a fixed-step solver when the fundamental sample time of your model is equal to one of the sample rates. Choose a variable-step solver to capture continuous dynamics, or when the fundamental sample time of your model is less than the fastest sample rate.

Decrease the solver order. Decreasing the solver order improves simulation speed because it reduces the number of calculations that Simulink performs to determine state outputs. Of course, the results become less accurate as the order of the solver decreases. The goal is to choose the lowest solver order that will produce results that meet your accuracy requirements.

Increase the solver step size or the error tolerance. Similarly, increasing the solver step size or increasing the solver error tolerance usually increases simulation speed at the expense of accuracy. Such changes should be made with care because they can cause Simulink to miss potentially important dynamics during simulations.

Disable zero-crossing detection. Variable-step solvers dynamically adjust the step size, increasing it when a variable changes slowly and decreasing it when a variable changes rapidly. This behavior causes the solver to take many small steps in the vicinity of a discontinuity because this is when a variable is rapidly changing. Accuracy improves, but it often comes with long simulation times.

To avoid the small time steps and long simulations associated with these situations, Simulink uses zero-crossing detection to accurately locate such discontinuities. For systems that exhibit frequent fluctuation betweens modes of operation—a phenomenon known as chattering—this zero-crossing detection can actually have the opposite effect and slow simulations. In these situations, it may be possible to adjust zero-crossing detection to improve performance.

Note: Zero-crossing detection can be enabled or disabled for specific blocks in the model. You can improve performance by disabling zero-crossing detection for blocks that do not affect the accuracy of the simulation.

### Saving the Simulation State

Engineers typically simulate a Simulink model repeatedly for different inputs, boundary conditions, and operating conditions. In many situations, these simulations share a common startup phase in which the model transitions from its initial state to some other state. An electric motor, for example, may be brought up to speed before various control sequences are tested.

Using the Simulink SimState feature, you can save the simulation state at the end of the startup phase and then restore it for use as the initial state for future simulations. This technique does not improve simulation speed per se, but it can reduce total simulation time for consecutive runs because the startup phase needs to be simulated only once.

## Running Multiple Simulations in Parallel

You can reduce the total amount of time it takes to run multiple independent simulations by distributing simulation tasks among multiple processing cores with Simulink and Parallel Computing Toolbox. You can further reduce overall simulation time by using MATLAB Distributed Computing Server to run the simulations on a computer cluster.

Common use cases for running simulations in parallel include Monte Carlo analysis and design optimization. For example, you might set up a Monte Carlo simulation in which you vary the value of a parameter across a predetermined range. You can then perform simulations for each parameter value independently and in parallel on multiple cores.

You can parallelize many of the tasks involved in design optimization, including estimating model parameters from test data, tuning controller gains to achieve a desired response, optimizing design parameters, performing sensitivity analysis, and performing robustness analysis. The total simulation time decreases as the number of processors in use increases (Figure 6).

Figure 6. Speedup (measured by the ratio of time needed to complete iterations sequentially over the time need to complete them in parallel) as a function of the number of workers used.

Often, you can convert a sequential algorithm to a parallel algorithm by simply changing a for-loop to a parfor-loop. The parfor construct in Parallel Computing Toolbox is similar to a standard for-loop. The key difference is that parfor distributes the computations performed within the loop to worker processors.

Starting with Simulink R2012b, you can use the Performance Advisor to check for conditions and configuration settings that might cause inefficient simulation performance (Figure 7). The Performance Advisor analyzes the model and produces a report that lists the suboptimal conditions or settings that it finds. It suggests better model configuration settings where appropriate, and provides mechanisms for fixing issues automatically or manually. The techniques recommended in this article, as well as other approaches, can be automatically tested in the Performance Advisor.

## Applying the Techniques and Measuring the Results

To illustrate the relative effectiveness of these techniques on a realistic project, we measured the simulation time performance improvement provided by applying some of the changes suggested above to a model of an automatic transmission system (Figure 8).

Figure 8. Simulink model of an automatic transmission system.

To improve performance, we made the following changes:

• Simplified the graphics used in the model (changed the engine image file format from TIF to JPG, removed the transmission image, simplified the car line art)
• Loaded data via a MAT-file instead of a script in PreLoadFcn, and decimated logged signal vehicle_speed by 10
• Replaced Interpreted MATLAB Function blocks with MATLAB functions (This change has the greatest single effect.)
• Enabled the following optimizations: Block reduction, Implement logic signals as Boolean (vs. double) data, and Inline parameters
• Disabled expensive diagnostics that check for solver data inconsistency, division by singular matrix, Inf or NaN block output, simulation range checking, and array bounds exceeded
• Removed Scopes and enabled optimizations for Accelerator mode
• Replaced an ode3 fixed-step solver with an ode23 variable-step solver, and set max step size to auto

These changes reduced simulation time from 9.06 seconds to 0.98 seconds! Using the optimized model, we can now apply rapid acceleration and parallel simulation to compare the performance of those techniques (Table 1).

 Original Model, Normal Mode, Serial Execution 453 seconds Improved Model, Normal Mode, Serial Execution 49.1 seconds Improved Model, Rapid Accelerator Mode, Serial Execution 31.4 seconds Improved Model, Normal Mode, Parallel Simulation (6 local workers) 13.2 seconds Improved Model, Rapid Accelerator Mode, Parallel Simulation (6 local workers) 8.58 seconds Improved Model, Rapid Accelerator Mode, Parallel Simulation (12 local workers) 4.86 seconds
Table 1. Time for 50 iterations of each model for 1000 seconds of simulation time. System used: Lenovo S20 with Intel ® Xeon W3690 (3.46 GHz, 12 MB cache), 24 GB DDR3 1333 MHz PC3-10600 SDRAM, Drive 1: 500 GB 7200 RPM, Drive 2: 160 GB SSD with TRIM Support; MATLAB R2011b is installed on Drive 2.

Published 2012 - 92002v01