Main Content

Object Tracking using 2-D FFT

This example shows how to implement an object tracking algorithm on FPGA. The model can be configured to support a high frame rate of 1080p@120 fps.

High speed object tracking is essential for a number of computer vision tasks and finds applications ranging across automotive, aerospace and defense sectors. The main principle behind the tracking technique employed is adaptive template matching where the best match of a template within an input image region is detected at each frame.

Download Input File

This example uses the quadrocopter.avi file from the Linkoping Thermal InfraRed (LTIR) dataset [2] as an input. The file is approximately 3 MB in size. Download the file from the MathWorks website and unzip the downloaded file.

LTIRZipFile = matlab.internal.examples.downloadSupportFile('visionhdl','');
[outputFolder,~,~] = fileparts(LTIRZipFile);
quadrocopterVideoFile = fullfile(outputFolder,'LTIR_dataset');


The example model provides two subsystems, a behavioral design using the Computer Vision Toolbox and an HDL design using the Vision HDL Toolbox that is supported for HDL code generation. The ObjectTrackerHDL subsystem is the hardware part of the design, and takes as input a pixel stream. The ROI Selector block dynamically selects an active region of the pixel stream that corresponds to a square search template. This template is 2-D correlated with an initialized adaptive filter. The maximum point of correlation determines the new template location and is used to shift the template in the next frame.

The ObjectTrackerHDL subsystem provides two configuration mask parameters:

  • ObjectCenter: The x and y coordinate pair that indicates the center of the object or the template.

  • templateSize: Size of the square template. The allowable sizes range from 16 to 256 taken in the powers of 2.

modelname = 'ObjectTrackerHDL';

Object Tracker HDL Subsystem

The input to the design is a grayscale or a thermal uint8 image. The input image can be of custom size. Thermal image tracking can involve additional challenges with fast motion and illumination variation. Therefore, a higher frame rate is usually desirable for most InfraRed (IR) applications.

The ObjectTrackerHDL design consists of the subsystems: Preprocess, Tracking and Overlay subsystems. The preprocess logic selects the template and does mean subtraction, variance normalization, and windowing to emphasize the target better. Tracking subsystem tracks the template across the frames. The overlay subsystem consists of the VideoOverlay block. It accepts a pixel streaming input and takes the position of the template and overlays it onto the frame for viewing. It provides five color options and configurable opacity for better visualization.

open_system([modelname '/ObjectTrackerHDL'],'force');

Tracking Algorithm

The tracking algorithm uses a Minimum Output Sum of Squared Error[1] (MOSSE) filter for correlation. This type of filter tries to minimize the sum of squared error between the actual and desired correlation. The initial setup for tracking is a simple training procedure that happens at the initialization of the model. The InitFcn callback provides this setup. During this setup, the filter is pre-trained using random affine transformations on the first frame template. The training output is a 2-D Gaussian centered on the training input. The following configurations in the InitFcn can be updated additionally to better suit any given application.

  • eta($\eta$): The learning rate or the weight given to the previous frame's coefficients.

  • sigma: The gaussian variance or the sharpness of the target object.

  • trainCount: The number of training images used.

After the training procedure, the initial coefficients of the filter are available and loaded as constants in the model. This adaptive algorithm updates the filter coefficients after each frame. Let $G_i$ be the desired correlation output, then the algorithm tries to derive a filter $H_i$, such that its correlation with the template $F_i$ satisfies the following optimization equation.

$min\sum_{i}{|F_i \odot H^*-G_i|}^2$

This equation can be solved as follows:


$A_i=\eta G_i \odot F_i^* + (1-\eta)A_{i-1}$

$B_i=\eta F_i \odot F_i^* + (1-\eta)B_{i-1}$

The learning rate is used to consider the effect of previous frames as the filter adapts to follow the object being tracked. The algorithm is iterative, and the given template is correlated with the filter and the maximum of correlation is used to guide the selection of the new template.

Track Subsystem

After the pixel stream is preprocessed, the Track subsystem first performs 2-D correlation between the initial template and the filter. This correlation is performed by first converting the template into frequency domain using 2-D FFT. In the frequency-domain, correlation is efficiently implemented as element-wise multiplication. The Maxcorrelation subsystem finds the column and row in the template where the maximum value occurs. It streams in pixels and compares them to find the maximum value and the HV Counter block determines the location of this maximum value. If more than one maximum value exists, it finds the mean of the solutions. If the pixel value streamed in is already equal to the maximum value, the location is updated as the mean location corresponding to both values. This process is repeated until a new maximum value is found or the number of pixels in the frame are exhausted. The ROIUpdate subsystem updates the prevROI by using the maximum point in correlation and shifting its center to the new maximum point to yield currROI.

open_system([modelname '/ObjectTrackerHDL/Track'],'force');

2-D Correlation Subsystem

The 2-D correlation is performed in frequency domain. The 2-DCorrelation subsystem has two templates in process at each frame, i.e., previous and current ROI templates. Both the templates are represented at the frequency scale using 2-D FFT. The current template is used to update coefficients of the filter. The CoefficientsUpdate subsystem contains RAM blocks to store the coefficients, that are updated to be used in the next frame. The coefficients update block stores the coefficients of the filter in the frequency domain, so they can be element wise multiplied to get the output correlation. The two pixel streams are aligned before multiplication. The alignment is guided by a control determined by comparing the previous and current ROI values. The result is converted back to time domain using an IFFT.

open_system([modelname '/ObjectTrackerHDL/Track/2-DCorrelation'],'force');

2-D FFT Subsystem

The 2-D FFT is calculated by performing a 1-D FFT across the rows of the template, storing the result and performing a 1-D FFT across its columns. For more details, see FFT (DSP HDL Toolbox)(1-D). The result that is stored in a CornerTurnMemory subsystem has ping pong buffering to enable high speed read and write.

open_system([modelname '/ObjectTrackerHDL/Track/2-DCorrelation/Prev2-DFFT'],'force');

Simulation and Output

At the end of each frame, the model updates the video display for the behavioral and HDL designs. Although the two outputs closely follow each other, a slight deviation in one may compound over a few frames. Both systems can independently track an object through the video. The Linkoping Thermal InfraRed (LTIR) dataset [2] has been used. The quadrocopter sequence is employed in this example and it contains 480p uint8 images and the template size is chosen as 128. The object being tracked is a quadrocopter as shown below.

Implementation Results

To check and generate the HDL code referenced in this example, you must have the HDL Coder™ product. To generate the HDL code, use this command.


The generated code was synthesized for a target of Xilinx ZCU106 SoC. The design met a 285 MHz timing constraint for a template size of 128. The hardware resources are as shown in the table.

T = table(...
    categorical({'260 (15.05%)';'65311 (14.17%)';'45414 (19.71%)';'95 (30.44%)';'36 (37.5%)'}),...
T =

  5×2 table

    Resource        Usage     
    ________    ______________

    DSP48       260 (15.05%)  
    Register    65311 (14.17%)
    LUT         45414 (19.71%)
    BRAM        95 (30.44%)   
    URAM        36 (37.5%)    


[1] D. S. Bolme, J. R. Beveridge, B. A. Draper and Y. M. Lui, "Visual object tracking using adaptive correlation filters," 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2010, pp. 2544-2550, doi: 10.1109/CVPR.2010.5539960.

[2] A. Berg, J. Ahlberg and M. Felsberg, "A Thermal Object Tracking Benchmark," 2015 12th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2015, pp. 1-6, doi: 10.1109/AVSS.2015.7301772.