Use Dynamically Allocated C++ Arrays in Generated Function Interfaces

In most cases, when you generate code for a MATLAB^® function that accepts or returns an array, there is an array at the interface of the generated CUDA^® function. For an array size that is unknown at compile time, or whose bound exceeds a predefined threshold, the memory for the generated array is dynamically allocated.

By default, the dynamically allocated array is implemented by using the C style emxArray data structure in the generated code. Alternatively, dynamically allocated array can be implemented as a class template called coder::gpu_array in the generated code. coder::gpu_array offers several advantages over emxArray style data structures:

The generated code is exception safe.
Generated code is easier to read.
Better C++ integration because of ease of initializing the input data and working with the output data.
Because coder::gpu_array is defined in a header file that ships with MATLAB, you can write the interface code before the generating code.

To use dynamically allocated arrays in your custom CUDA code that you integrate with the generated CUDA C++ functions, learn to use the coder::gpu_array template.

Change Interface Generation

By default, the generated CUDA code uses the C style emxArray data structure to implement dynamically allocated arrays. Instead, you can choose to generate CUDA code that uses the coder::gpu_array template to implement dynamically allocated arrays. To generate the coder::gpu_array template, do one of the following:

In a code configuration object (coder.MexCodeConfig, coder.CodeConfig, or coder.EmbeddedCodeConfig), set the DynamicMemoryAllocationInterface parameter to 'C++'.
In the GPU Coder™ app settings, on the Memory tab, set Dynamic memory allocation interface to Use C++ coder::array.

Using the `coder::gpu_array` Class Template

When you generate CUDA code for your MATLAB functions, the code generator produces header files coder_gpu_array.h and coder::array.h in the build folder. The coder_gpu_array.h header file contains the definition of the class template gpu_array in the namespace coder and the definitions for the function templates arrayCopyCpuToGpu and arrayCopyGpuToCpu. The coder::gpu_array template implements the dynamically allocated arrays in the generated code. The declaration for this template is:

template <typename T, int32_T N> class gpu_array

The array contains elements of type T and has N dimensions. For example, to declare a two-dimensional dynamic array myArray that contains elements of type int32_T in your custom CUDA code, use:

coder::gpu_array<int32_T, 2> myArray

The function templates arrayCopyCpuToGpu and arrayCopyGpuToCpu implement data transfers between the CPU and GPU memory. On the CPU, the dynamically allocated arrays are implemented by using the coder::array template. For more information on the APIs you use to create and interact with dynamic arrays in your custom code, see Use Dynamically Allocated C++ Arrays in Generated Function Interfaces.

To use dynamically allocated arrays in your custom CUDA code that you want to integrate with the generated code (for example, a custom main function), include the coder_gpu_array.h and coder_array.h header files in your custom .cu files.

Generate C++ Code That Accepts and Returns a Variable-Size Numeric Array

This examples shows how to customize the generated example main function to use the coder::gpu_array and coder::array class templates in your project.

Your goal is to generate a CUDA executable for xTest1 that can accept and return an array of int32_T elements. You want the first dimension of the array to be singleton and the second dimension to be unbounded.

Define a MATLAB function xTest1 that accepts an array X, adds the scalar A to each of its elements, and returns the resulting array Y.
```
function Y = xTest1(X, A)
Y = X;
for i = 1:numel(X)
    Y(i) = X(i) + A;
end
```
Generate initial source code for xTest1 and move xTest1.h from the code generation folder to your current folder. Use the following commands:
```
cfg = coder.gpuConfig('lib');
cfg.DynamicMemoryAllocationInterface = 'C++';
cfg.GenerateReport = true;
inputs = {coder.typeof(int32(0), [1 inf]), int32(0)};

codegen -config cfg -args inputs xTest1.m
```
The function prototype for xTest1 in the generated code is shown here:
```
extern void xTest1(const coder::array<int, 2U> &X, int A,
                   coder::array<int, 2U> &Y);
```
Interface the generated code by providing input and output arrays that are compatible with the function prototype shown above.

Define a CUDA main function in the file xTest1_main.cu in your current working folder.

This main function includes the header files coder_gpu_array.h and coder_array.h that contain the coder::gpu_array and coder::array class template definitions respectively. The main function performs these actions:

Declare myArray and myResult as two-dimensional coder::array dynamic arrays of int32_T elements.
Dynamically set the sizes of the two dimensions of myArray to 1 and 100 by using the set_size method.
Access the size vector of myResult by using myResult.size.

#include<iostream>
#include<coder_array.h>
#include<xTest1.h>

int main(int argc, char *argv[])
{
    static_cast<void>(argc);
    static_cast<void>(argv);
    
    // Instantiate the input variable by using coder::array template
    coder::array<int32_T, 2> myArray;     
    
    // Allocate initial memory for the array
    myArray.set_size(1, 100);             

    // Access array with standard C++ indexing
    for (int i = 0; i < myArray.size(1); i++) {
        myArray[i] = i;                   
    }
    
    // Instantiate the result variable by using coder::array template
    coder::array<int32_T, 2> myResult;

    // Pass the input and result arrays to the generated function
    xTest1(myArray, 1000, myResult);

    for (int i = 0; i < myResult.size(1); i++) {
        if (i > 0) std::cout << " ";
        std::cout << myResult[i];
        if (((i+1) % 10) == 0) std::cout << std::endl;
    }
    std::cout << std::endl;

    return 0;
}

Generate code by running this script:

cfg = coder.gpuConfig('exe');
cfg.DynamicMemoryAllocationInterface = 'C++';
cfg.GenerateReport = true;
cfg.CustomSource = 'xTest1_main.cu';
cfg.CustomInclude = '.';
codegen -config cfg -args inputs xTest1_main.cu xTest1.m

The code generator produces an executable file xTest1 in your current working folder. Run the executable using the following commands:

if ispc
  !xtest1.exe
else
  !./xTest1
end

 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009
 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019
 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029
 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039
 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049
 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059
 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069
 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079
 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089
 1090 1091 1092 1093 1094 1095 1096 1097 1098 1099

Limitations

For generating CUDA code that uses coder::gpu_array, the GPU memory allocation mode must be set to discrete.
To change the memory allocation mode in the GPU Coder app settings, in the GPU Code section, use the Malloc mode parameter. When using the command-line interface, use the MallocMode build configuration property and set it to either 'discrete' or 'unified'.
GPU Coder does not support coder::gpu_array in Simulink^®.