Multicore Simulation of Comparing Demodulation Types

This example compares an LLR and hard decision demodulation. It uses the dataflow domain in Simulink® to automatically partition the data-driven portions of the communications system into multiple threads and thereby improving the performance of the simulation by executing it on your desktop's multiple cores.

Introduction

The dataflow execution domain allows you to make use of multiple cores in the simulation of computationally intensive systems. This example shows how dataflow as the execution domain of a subsystem improves simulation performance of the model. To learn more about dataflow and how to run Simulink models using multiple threads, see Multicore Execution using Dataflow Domain (DSP System Toolbox).

LLR vs Hard Decision Demodulation

This example shows a communication system that compares BER performance when using LLR instead of hard decision demodulation in the decoder. This example has one transmitter, an AWGN channel and three receivers. The three receivers use different decoding techniques to compare the BER of each approach. Bit error rate computation is shown in Display blocks for comparing the performance of the three receivers.

Setting up Dataflow Subsystem

This example uses dataflow domain in Simulink to make use of multiple cores on your desktop to improve simulation performance. The Domain parameter of the Dataflow Subsystem in this model is set as Dataflow. You can view this by selecting the subsystem and then selecting View>Property Inspector. Dataflow domains automatically partition your model and simulate the system using multiple threads for better simulation performance. Once you set the Domain parameter to Dataflow, you can use Dataflow Simulation Assistant to analyze your model to get better performance. You can open the Dataflow Simulation Assistant by clicking on the Dataflow assistant button below the Automatic frame size calculation parameter in Property Inspector.

Analyzing Concurrency in Dataflow Subsystem

The Dataflow Simulation Assistant suggests changing model settings for optimal simulation performance. To accept the proposed model settings, next to Suggested model settings for simulation performance, click Accept all. Alternatively, you can expand the section to change the settings individually. In this example the model settings are already optimal. In the Dataflow Simulation Assistant, click the Analyze button to start the analysis of the dataflow domain for simulation performance. Once the analysis is finished, the Dataflow Simulation Assistant shows how many threads the dataflow subsystem will use during simulation.

After analyzing the model, the assistant shows three threads because the three different receiver types can run independently in parallel. When Latency used is zero, dataflow can only use this inherent parallelism in the model. The three receivers are data dependent on one transmitter. This causes bottleneck since the transmitter needs to complete its processing before any receivers start processing. Without pipeline delays only the inherent parallelism in the model can be utilized to run Dataflow Subsystem using multiple threads. By pipelining the data dependent blocks, the Dataflow Subsystem can increase concurrency for higher data throughput. Dataflow Simulation Assistant shows the recommended number of pipeline delays as Suggested Latency. The suggested latency value is computed to give the best performance.

The following diagram shows the Dataflow Simulation Assistant where the Dataflow Subsystem currently specifies a latency value of zero, and the suggested latency for the system is two. Using the Suggested Latency value introduces pipeline delays in the model and enables more blocks to run in parallel.

Click the Accept button next to Suggested Latency in the Dataflow Simulation Assistant to use the recommended latency for the Dataflow Subsystem.

Dataflow Simulation Assistant now shows the number of threads as five implying that the blocks inside the dataflow subsystem simulate in parallel using five threads. Use of two pipeline delays increased the number of blocks that can be run in parallel inside Dataflow Subsystem. Latency value can also be entered directly in the Property Inspector for "Latency" parameter. Simulink shows the latency parameter value using $Z^{-1}$ tags at the output ports of the dataflow subsystem.

Multicore Simulation Performance

We measure the performance improvement of using multiple cores by comparing the execution time taken for running model using multiple threads with the time taken when the model does not use dataflow. Execution time is measured using the sim command, which returns the simulation execution time of the model. These numbers and analysis were published on a Windows desktop computer with Intel® Xeon® CPU W-2133 @ 3.6 GHz 6 Cores 12 Threads processor.

Simulation execution time for multithreaded model = 4.50s
Simulation execution time for single-threaded model = 9.76s
Actual speedup with dataflow: 2.2x

Summary

This example shows how dataflow execution domain can improve performance in a communication system model using multiple cores on the desktop.