Accelerate BER Measurement for Wireless HDL LTE Turbo Decoder
This example shows the workflow to measure the BER of the Wireless HDL Toolbox™ LTE Turbo Decoder block using parsim to parallelize the simulations across EbNo points. This approach can be used to accelerate other Monte Carlo simulations.
Introduction
HDL implementations of reference applications are often complex and take a lot of time to simulate. As a result, figuring out the bit error rate (BER) performance by running multiple simulations at different SNR points can be very time consuming. One way to optimize this is to parallelize simulations using the parsim
command. The parsim
command runs multiple simulations in parallel when called with a Parallel Computing Toolbox™ license available. This example measures the BER of the LTE Turbo Decoder. To achieve sufficient statistical accuracy, around 100 errors must be obtained at the decoder for each EbNo value. This translates to 1e8 bits at a BER of 10e-6. This type of Monte Carlo simulation is a suitable candidate to parallelize using parsim, where the BER for every EbNo point is performed on workers in parallel.
For every parallel simulation, this example sets up the input data as follows:
Generate input data frames
Turbo encode
QPSK modulate
Add AWGN based on the EbNo value
Demodulate the noisy symbols
Generate soft decisions
The soft decisions become the input to the LTE Turbo Decoder in Simulink®. The turbo decoded bits are compared to the transmitted bits to calculate the BER. Each parallel simulation sends the results back to the main host.
Configure Parameters and Simulation Objects
The total number of information bits for each EbNo
point, bitsPerEbNo
, is divided over multiple simulations, defined by parsimPerEbNo
. In this way, every simulation runs bitsPerParsim
bits for a single EbNo
point. The total number of simulations is length(EbNo)*parsimPerEbNo
. This example is configured to run only a small number of bits for demonstration purposes. In a real scenario, you must run a sufficient number of samples through the decoder for an accurate measure of the BER at the higher EbNo
points. When choosing these parameters, consider the memory resources available on the host. A large input data set per simulation or large number of workers could result in slow down or memory exhaustion. The structure simParam
contains the parameters required for each simulation. This structure is sent to the simulations at a later stage.
EbNo = 0:0.1:1.1; bitsPerEbNo = 1e5; %1e8; parsimPerEbNo = 2; %10; bitsPerParsim = ceil(bitsPerEbNo/parsimPerEbNo); simParam.blkSize = 6144; simParam.turboIterations = 6; simParam.numFrames = ceil(bitsPerParsim/simParam.blkSize); % frames per simulation simParam.modScheme = 'QPSK'; simParam.bps = 2; % bits per symbol tailBits = 4; % encoder property simParam.encoderRate = simParam.blkSize/(3*(simParam.blkSize+tailBits)); % rate 1/3 Turbo code simParam.samplesizeIn = floor(1/simParam.encoderRate); % 3 samples in at a time simParam.inframeSize = simParam.samplesizeIn*(simParam.blkSize+tailBits); model = 'LTEHDLTurboDecoderBERExample'; open_system(model);
Start a local parallel pool with minimum of 1 and maximum of maxNumWorkers
. If a Parallel Computing Toolbox™ license is not available, the simulations will be serialized. The actual size of the pool depends on the number of available cores. Each parallel worker gets assigned one core on which an independent MATLAB® session is launched.
maxNumWorkers = 3;
pool = parpool('local', [1 maxNumWorkers]);
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 3).
Preallocate a parsim
object to hold the data required for each simulation. The object can also include handles to functions, which the model calls before or after a simulation. The MATLAB® session on which parsim is executed acts as the main host. The main host is responsible for launching the simulations on the workers, sending the required data to every worker, and receiving the results.
parsimIn(1:length(EbNo)*parsimPerEbNo) = Simulink.SimulationInput(model);
Replicate EbNo
points to set up parsimPerEbNo
simulations.
repEbNo = repmat(EbNo,parsimPerEbNo,1); repEbNo = repEbNo(:);
Minimizing data transmission to the workers improves the performance and stability of the main host. Therefore, this example generates the input data in-model, rather than passing the large input data set to each worker. Input data is generated using the pre-simulation function, presimGenInput
and the BER calculation is also performed in the post-simulation function, postsimOutput
. These function handles are assigned to each SimulationInput
object. The post-simulation function is assigned inside the pre-simulation function as shown in the section Pre-Simulation and Post-Simulation Functions.
for noiseRatio = 1:length(repEbNo) % Calculate the noise variance. EsNo = repEbNo(noiseRatio) + 10*log10(simParam.bps); snrdB = EsNo + 10*log10(simParam.encoderRate); noiseVar = 1./(10.^(snrdB/10)); % Use random but reproducible data. seed = noiseRatio; % For Rapid Accelerator mode, set the simulation % stop time before compilation. parsimIn(noiseRatio) = parsimIn(noiseRatio).setModelParameter('StopTime',num2str(simParam.numFrames)); % Set pre-simulation function. parsimIn(noiseRatio) = parsimIn(noiseRatio).setPreSimFcn(@(simIn) presimGenInput(simIn,noiseVar,seed,simParam)); end
Run and show progress of the simulations in the command window. At the end of the simulations, the results are sent back to the main host in an array of structures, parsimOut
, with one entry created per simulation. Once simulations are complete, shut down the parallel pool.
parsimOut = parsim(parsimIn,'ShowProgress','on','StopOnError','on'); delete(pool);
[16-Jul-2021 12:34:57] Checking for availability of parallel pool... [16-Jul-2021 12:34:58] Starting Simulink on parallel workers... [16-Jul-2021 12:35:53] Configuring simulation cache folder on parallel workers... [16-Jul-2021 12:35:53] Loading model on parallel workers... [16-Jul-2021 12:36:06] Running simulations... [16-Jul-2021 12:38:49] Completed 1 of 24 simulation runs [16-Jul-2021 12:38:49] Completed 2 of 24 simulation runs [16-Jul-2021 12:38:49] Completed 3 of 24 simulation runs [16-Jul-2021 12:38:56] Completed 4 of 24 simulation runs [16-Jul-2021 12:38:56] Completed 5 of 24 simulation runs [16-Jul-2021 12:38:56] Completed 6 of 24 simulation runs [16-Jul-2021 12:39:03] Completed 7 of 24 simulation runs [16-Jul-2021 12:39:03] Completed 8 of 24 simulation runs [16-Jul-2021 12:39:03] Completed 9 of 24 simulation runs [16-Jul-2021 12:39:09] Completed 10 of 24 simulation runs [16-Jul-2021 12:39:09] Completed 11 of 24 simulation runs [16-Jul-2021 12:39:09] Completed 12 of 24 simulation runs [16-Jul-2021 12:39:15] Completed 13 of 24 simulation runs [16-Jul-2021 12:39:16] Completed 14 of 24 simulation runs [16-Jul-2021 12:39:16] Completed 15 of 24 simulation runs [16-Jul-2021 12:39:21] Completed 16 of 24 simulation runs [16-Jul-2021 12:39:21] Completed 17 of 24 simulation runs [16-Jul-2021 12:39:22] Completed 18 of 24 simulation runs [16-Jul-2021 12:39:27] Completed 19 of 24 simulation runs [16-Jul-2021 12:39:27] Completed 20 of 24 simulation runs [16-Jul-2021 12:39:28] Completed 21 of 24 simulation runs [16-Jul-2021 12:39:33] Completed 22 of 24 simulation runs [16-Jul-2021 12:39:33] Completed 23 of 24 simulation runs [16-Jul-2021 12:39:33] Completed 24 of 24 simulation runs [16-Jul-2021 12:39:33] Cleaning up parallel workers...
Plot BER
Extract the BER values from the array of structures. Combine the BER results for each EbNo
point and find the average BER per EbNo
point.
BER = [parsimOut(:).BER]; BER = transpose(reshape(BER,parsimPerEbNo,length(BER)/parsimPerEbNo)); avgBER = mean(BER,2); semilogy(EbNo,avgBER,'-o'); grid; xlabel('Eb/No (dB)'); ylabel('Bit Error Rate');
The plot below shows the results of the BER measurement with bitsPerEbNo
= 1e8.
Pre-Simulation and Post-Simulation Functions
These functions independently generate input data and process output data for each simulation, which eliminates the need for the main host to store the data in memory for all simulations. The presimGenInput
function generates input bits, then encodes, modulates and converts them to soft decisions. To make the input frames and parameters available to the model, they are assigned as variables in the global workspace using the setVariable
function.
function simIn = presimGenInput(simIn,noiseVar,seed,simParam) rng(seed); % Preallocate arrays for speed. txBits = zeros(simParam.blkSize,simParam.numFrames,'int8'); inFrames = zeros(simParam.inframeSize,simParam.numFrames,'single'); % Generate input frames, turbo encode, modulate and add noise based on % noise variance. for currentFrame = 1:simParam.numFrames txBits(:,currentFrame) = randi([0 1],simParam.blkSize,1); codedData = lteTurboEncode(txBits(:,currentFrame)); txSymbols = lteSymbolModulate(codedData,simParam.modScheme); noise = (sqrt(noiseVar/2))*complex(randn(size(txSymbols)),randn(size(txSymbols))); rxSymbols = txSymbols + noise; inFrames(:,currentFrame) = lteSymbolDemodulate(rxSymbols,simParam.modScheme,'Soft'); end % Set up parameters for Frame to Samples block to serialize data. % Leave sufficient gap between frames. simParam.idleCyclesBetweenSamples = 0; halfIterationLatency = (ceil(simParam.blkSize/32)+3)*32; % window size = 32 algFrameDelay = 2*simParam.turboIterations*halfIterationLatency+(simParam.inframeSize/simParam.samplesizeIn); simParam.idleCyclesBetweenFrames = algFrameDelay; % Assign variables to global workspace. simIn = simIn.setVariable('inFrames',inFrames); simIn = simIn.setVariable('simParam',simParam); % Set post-simulation function and send required data. simIn = simIn.setPostSimFcn(@(simOut) postsimOutput(simOut,txBits,simParam)); end
The post-simulation function receives the outputs of the simulation and computes the BER. The results are stored in a structure results
which parsim returns as parsimOut
.
function results = postsimOutput(out, txBits, simParam) decodedOutValid = out.decodedOut(out.validOut); results.numErrors = sum(xor(txBits(:),decodedOutValid)); results.BER = results.numErrors/(simParam.numFrames*simParam.blkSize); end
Conclusion
This example showed how to efficiently measure the BER curve for the Wireless HDL LTE Turbo Decoder block using parsim. If a parallel pool is not used, the linear time to complete the simulations would be approximately 16 hours. As a result of parallelization, the time to run all simulations came down to 5.4 hours, using 3 workers. This was achieved by running the simulations in Rapid Accelerator mode. This workflow can be applied to complex reference applications that require Monte Carlo or other simulations.