Monocular Visual-Inertial SLAM

This example uses:

Simultaneous Localization and Mapping (SLAM) is a complex process that involves determining the position and orientation of a sensor, such as a camera, in relation to its surroundings. It also entails creating a map of the environment by identifying the 3D locations of various points. This example demonstrates how to effectively perform SLAM by combining images captured by a monocular camera with measurements obtained from an IMU sensor.

The method demonstrated in this example is inspired by ORB-SLAM3 which is a feature-based visual-inertial SLAM algorithm. The IMU and camera fusion is achieved using a factorGraph (Navigation Toolbox), which is a bipartite graph that consists of two types of nodes:

Variable Nodes: They represent the unknown random variables in an estimation problem such as the position of a mobile robot or a 3D point.
Factor Nodes: They represent a function which quantifies the relationship between the random variables. For example, a camera correspondence factor between 2D image points and 3D map points or an IMU measurement factor between two positions.

Factor graphs are a popular way of designing algorithms for sensor fusion-based SLAM because they provide an efficient and modular solution to SLAM problems and allow the graph to be easily updated if new measurements or constraints become available. Additionally, this type of model provides a flexible approach incorporating different types of sensors and data, including visual, lidar and inertial sensors, which makes it useful for variety of SLAM applications.

This example illustrates how to construct a monocular visual-inertial SLAM pipeline using a factor graph step by step.

Overview

MicrosoftTeams-image (1).png

The visual-inertial SLAM pipeline includes the following steps:

Initial IMU Bias Estimation (Optional): If your data sequence contains a sufficient number of static frames, you can use them to estimate the IMU bias. This helps you initialize the factor graph later.
Map Initialization: Start by initializing the map of 3-D points from two video frames. The 3-D points and relative camera pose are computed using triangulation based on 2-D ORB feature correspondences.
Camera and IMU alignment: Once the map is initialized, generate a set of camera poses using Structure From Motion (SFM). Then use corresponding IMU measurements to align the camera and IMU frames and to find the scale conversion between the IMU and camera odometries.
Tracking: Once the camera and IMU data are aligned and scaled correctly, initialize the factor graph. Then, for each new key frame, estimate the camera pose by matching features in the current frame to features in the last key frame. Refine the estimated camera pose by tracking the local map. Then, add the camera poses to the factor graph with their corresponding IMU measurements.
Local Mapping: Use the current frame to create new 3-D map points if it is identified as a key frame. At this stage, the factor graph is optimized, and the camera poses and 3-D points are adjusted using a set of local key frames.
Loop Closure: Detect loops for each key frame by comparing the key frame against all previous key frames using the bag-of-features approach. Once a loop closure is detected, add a new link to the factor graph to reflect the connection found by the loop closure.

Download the Input Data Sequence

The data used in this demo has been extracted from the Blackbird data set (NYC Subway Winter). It represents a sequence from a UAV flying around in a simulated subway environment. Download the MAT file which contains the images, camera intrinsics and IMU measurements.

uavData = helperDownloadSLAMData();

images     = uavData.images;
intrinsics = uavData.intrinsics;
timeStamps = uavData.timeStamps;

Set up the noise parameters using the factorIMUParameters object. These values are typically provided by the IMU manufacturer.

imuParams = factorIMUParameters(SampleRate=100,GyroscopeNoise=0.1, ...
            GyroscopeBiasNoise=3e-3,AccelerometerNoise=0.3, ...
            AccelerometerBiasNoise=1e-3,ReferenceFrame="ENU");

Initial Bias Estimation

A section of the input data sequence has only static frames. These frames are useful for estimating the initial bias of the IMU sensor, which you can then use as a prior when we construct the factor graph. Accounting for the bias helps to reduce pose estimation errors because it can serve as an initial guess during the optimization. This step is optional, as it depends on the availability of static frames.

startBiasIdx  = 40; %  Index from which the bias estimation starts
startFrameIdx = 150; % Index from which the SLAM pipeline starts
initBias = helperEstimateInitialBias(uavData,startBiasIdx,startFrameIdx,imuParams);

Map Initialization

We start by initializing the map which holds the 3-D world points. This step is crucial and has a significant impact on the accuracy of final SLAM result. Find the initial ORB feature point correspondences using the matchFeatures function between a pair of images. After the correspondences are found, you can use these two geometric transformation models to establish map initialization:

Homography: If the scene is planar, a homography projective transformation is a better choice to describe feature point correspondences.
Fundamental Matrix: If the scene is non-planar, a fundamental matrix must be used instead.

The homography and the fundamental matrix can be computed using estgeotform2d and estimateFundamentalMatrix, respectively. The model that results in a smaller reprojection error is selected to estimate the relative rotation and translation between the two frames using estrelpose. Because the RGB images are taken by a monocular camera which does not provide the depth information, the relative translation can only be recovered up to a specific scale factor.

Given the relative camera pose and the matched feature points in the two images, the 3-D locations of the matched points are determined using triangulateMultiview function. A triangulated map point is valid when it is in the front of both cameras, when its reprojection error is low, and when the parallax of the two views of the point is sufficiently large.

% Set random seed for reproducibility
rng(0);

% Detect and extract ORB features
scaleFactor = 1.2;
numLevels   = 8;
numPoints   = 5000;
numSkipFrames = 20;
numPointsKeyFrame = 200;

currI = images{1, startFrameIdx};
[preFeatures, prePoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, numPoints, intrinsics); 

currFrameIdx = startFrameIdx + 1;
firstI       = currI; % Preserve the first frame 

isMapInitialized  = false;

% Map initialization loop
while ~isMapInitialized && currFrameIdx < size(images,2)

    currI = images{currFrameIdx};

    [currFeatures, currPoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, numPoints, intrinsics); 
    
    currFrameIdx = currFrameIdx + 1;
    
    % Find putative feature matches
    indexPairs = matchFeatures(preFeatures, currFeatures, Unique=true, MaxRatio=0.5, MatchThreshold=80);

    % If not enough matches are found, check the next frame
    minMatches = 100;
    if size(indexPairs, 1) < minMatches
        continue
    end
    
    preMatchedPoints  = prePoints(indexPairs(:,1),:);
    currMatchedPoints = currPoints(indexPairs(:,2),:);
    
    % Compute homography and evaluate reconstruction
    [tformH, scoreH, inliersIdxH] = helperComputeHomography(preMatchedPoints, currMatchedPoints);

    % Compute fundamental matrix and evaluate reconstruction
    [tformF, scoreF, inliersIdxF] = helperComputeFundamentalMatrix(preMatchedPoints, currMatchedPoints);
    
    % Select the model based on a heuristic
    ratio = scoreH/(scoreH + scoreF);
    ratioThreshold = 0.45;
    if ratio > ratioThreshold
        inlierTformIdx = inliersIdxH;
        tform          = tformH;
    else
        inlierTformIdx = inliersIdxF;
        tform          = tformF;
    end

    % Computes the camera location up to scale. Use half of the 
    % points to reduce computation
    inlierPrePoints  = preMatchedPoints(inlierTformIdx);
    inlierCurrPoints = currMatchedPoints(inlierTformIdx);
    [relPose, validFraction] = estrelpose(tform, intrinsics, ...
        inlierPrePoints(1:2:end), inlierCurrPoints(1:2:end));
    
    % If not enough inliers are found, move to the next frame
    if validFraction < 0.9 || numel(relPose)~=1
        continue
    end
    
    % Triangulate two views to obtain 3-D map points
    minParallax = 1; % In degrees
    [isValid, xyzWorldPoints, inlierTriangulationIdx] = helperTriangulateTwoFrames(...
        rigidtform3d, relPose, inlierPrePoints, inlierCurrPoints, intrinsics, minParallax);
    
    if ~isValid
        continue
    end
    
    % Get the original index of features in the two key frames
    indexPairs = indexPairs(inlierTformIdx(inlierTriangulationIdx),:);
    
    isMapInitialized = true;
    
    disp(['Map initialized with frame ', num2str(startFrameIdx),' and frame ', num2str(currFrameIdx-1)])
end % End of map initialization loop

Map initialized with frame 150 and frame 214

if isMapInitialized
    % Show matched features
    hfeature = showMatchedFeatures(firstI, currI, prePoints(indexPairs(:,1)), ...
        currPoints(indexPairs(:, 2)), "Montage");
else
    error('Unable to initialize the map.')
end

Store Initial Key Frames and Map Points

After the map is initialized using two frames, you can use imageviewset and worldpointset to store the two key frames and the corresponding map points:

imageviewset stores the key frames and their attributes, such as ORB descriptors, feature points, camera poses, and graph connections between the key frames, such as feature points matches and relative camera poses. It also builds and updates a pose graph. The absolute camera poses and relative camera poses of odometry edges are stored as rigidtform3d objects. The relative camera poses of loop-closure edges are stored as affinetform3d objects.
worldpointset stores 3-D positions of the map points and the 3-D to 2-D projection correspondences, indicating which map points are observed in a key frame and which key frames observe a map point. It also stores other attributes of map points, such as the mean view direction, the representative ORB descriptors, and the range of distance at which the map point can be observed.

% Create an empty imageviewset object to store key frames
vSetKeyFrames = imageviewset;

% Create an empty worldpointset object to store 3-D map points
mapPointSet   = worldpointset;

% Add the first key frame. Place the camera associated with the first 
% key frame at the origin, oriented along the Z-axis
preViewId     = 1;
vSetKeyFrames = addView(vSetKeyFrames, preViewId, rigidtform3d, Points=prePoints,...
    Features=preFeatures.Features);

% Add the second key frame
currViewId    = 2;
vSetKeyFrames = addView(vSetKeyFrames, currViewId, relPose, Points=currPoints,...
    Features=currFeatures.Features);

% Add connection between the first and the second key frame
vSetKeyFrames = addConnection(vSetKeyFrames, preViewId, currViewId, relPose, Matches=indexPairs);

% Add 3-D map points
[mapPointSet, newPointIdx] = addWorldPoints(mapPointSet, xyzWorldPoints);

% Add observations of the map points
preLocations  = prePoints.Location;
currLocations = currPoints.Location;
preScales     = prePoints.Scale;
currScales    = currPoints.Scale;

% Add image points corresponding to the map points in the first key frame
mapPointSet   = addCorrespondences(mapPointSet, preViewId, newPointIdx, indexPairs(:,1));

% Add image points corresponding to the map points in the second key frame
mapPointSet   = addCorrespondences(mapPointSet, currViewId, newPointIdx, indexPairs(:,2));

Initialize Loop Closure Detection Database

Loop detection in this example is performed using the bags-of-words approach. You can create a visual vocabulary represented as a bagOfFeatures object offline using the ORB descriptors extracted from a large set of images in the dataset by calling:

bag = bagOfFeatures(imds,CustomExtractor=@helperORBFeatureExtractorFunction,TreeProperties=[3, 10],StrongestFeatures=1);

where imds is an imageDatastore object storing the training images and helperORBFeatureExtractorFunction is the ORB feature extractor function. See Image Retrieval with Bag of Visual Words for more information.

The loop closure process builds a database, by encode the incoming images into feature vectors compactly representing each view.

% Load the bag of features data created offline
bofData         = load("uavBoF.mat");
encodedImages   = [];

% Add features of the first two key frames to the database
encodedImages = [encodedImages; encode(bofData.bag,firstI,Verbose=false)];
encodedImages = [encodedImages; encode(bofData.bag,currI ,Verbose=false)];

Initialize the Factor Graph and Configure Trajectory and Map Visualization

Refine the initial reconstruction using a factorGraph (Navigation Toolbox), which optimizes both camera poses and world points to minimize the overall reprojection errors.

In MATLAB®, working with a factor graph involves managing a set of unique IDs for different parts of the graph, including: poses, 3D points or IMU measurements. By using these IDs, you can add additional constraints can be added between the variable nodes in the factor graph, such as the corresponding 2D image matches for a set of 3D points, or gyroscope measurements between two sets of poses. This is accomplished using factor nodes. You can generate unique IDs using the generateNodeID (Navigation Toolbox) function.

To initialize the factor graph, two pose variable nodes representing the first and second poses are combined with a factorCameraSE3AndPointXYZ (Navigation Toolbox) that represents the links between 2D feature points and 3D map points. At this stage, IMU measurements cannot be added to the graph yet, as the correct alignment and scaling of IMU data require several camera poses to be collected first.

Once the initialization is successful, update the attributes of the map points including 3-D locations, view direction, and depth range. You can use helperVisualizeMotionAndStructure to visualize the map points and the camera locations.

% Initialize the factor graph
%   - viewToNode : array to convert imageViewSet indices to factorGraph pose id
%   - pointToNode: array to convert worldPointSet indices to factorGraph 3d point id
[fGraph, refinedAbsPoses, refinedPoints, viewToNode, pointToNode] = helperInitFactorGraph(vSetKeyFrames, mapPointSet, intrinsics);

% Update key frames with the refined poses
vSetKeyFrames = updateView(vSetKeyFrames, refinedAbsPoses);
vSetKeyFrames = updateConnection(vSetKeyFrames, preViewId, currViewId, relPose);

% Update map points with the refined positions
mapPointSet = updateWorldPoints(mapPointSet, newPointIdx, refinedPoints);

% Update view direction and depth 
mapPointSet = updateLimitsAndDirection(mapPointSet, newPointIdx, vSetKeyFrames.Views);

% Update representative view
mapPointSet = updateRepresentativeView(mapPointSet, newPointIdx, vSetKeyFrames.Views);

% Update factor graph with refined poses 
fgRefinedPose = double([refinedAbsPoses.AbsolutePose(1).Translation rotm2quat(refinedAbsPoses.AbsolutePose(1).R)]);
nodeState(fGraph,1,fgRefinedPose);
fgRefinedPose = double([refinedAbsPoses.AbsolutePose(2).Translation rotm2quat(refinedAbsPoses.AbsolutePose(2).R)]);
nodeState(fGraph,2,fgRefinedPose);

% Update factor graph with refined points
ptsIds = nodeIDs(fGraph,NodeType="POINT_XYZ");
nodeState(fGraph,ptsIds,refinedPoints);

% Visualize matched features in the current frame
close(hfeature.Parent.Parent);
featurePlot   = helperVisualizeMatchedFeatures(currI, currPoints(indexPairs(:,2)));

% Visualize initial map points and camera trajectory
mapPlot       = helperVisualizeMotionAndStructureVI(vSetKeyFrames, mapPointSet);

showPlotLegend(mapPlot);

% ViewId of the current key frame
currKeyFrameId   = currViewId;

% ViewId of the last key frame
lastKeyFrameId   = currViewId;

% Index of the last key frame in the input image sequence
lastKeyFrameIdx  = currFrameIdx - 1; 

% Indices of all the key frames in the input image sequence
addedFramesIdx   = [1; lastKeyFrameIdx];

IMU and Camera Alignment and Scaling

While using a monocular camera, pose estimates are obtained at an unknown scale, and differ from the metric measurements collected by an IMU. To address this issue, accelerometer measurements are used to estimate a scale factor which can scale the input camera poses to match the metric scale of the IMU measurements. In addition to that, you must align the reference frames of the input poses with the local navigation reference frame of the IMU to remove the constant gravity acceleration that is captured by the accelerometer can be removed. The estimateGravityRotationAndPoseScale function estimates both the scale and gravity rotation which you can use to improve the accuracy of the sensor fusion.

The frames used to obtain this alignment and scaling should be collected while keeping in mind a set of criteria:

Map initialization should be as accurate as possible, because it affects the camera poses greatly. Extracting a large number of features may be necessary to obtain great accuracy.
Camera poses should contain both static and moving frames, with the moving frames containing a strong acceleration in the gravity axis direction.
For efficiency, a sliding window should be used to store the poses. The window size should be kept relatively small to reduce the amount of scale drift in the poses.

Once the estimation has converged, update the poses on the factor graph by rotating and scaling them correctly. Also update the factor graph by setting the bias prior using the previously estimated value. Use the factorIMU (Navigation Toolbox) to link the corresponding IMU measurements to the previously stored poses. Finally, optimize the factor graph to adjust the poses and 3D points according to the newly added constrains. Use factorGraphSolverOptions (Navigation Toolbox) to set up the optimization parameters for the factor graph.

% Main loop
isLastFrameKeyFrame = true;
isIMUInitialized    = false;
isLoopClosed        = false;

keyTimeStamps   = [timeStamps.imageTimeStamps(startFrameIdx); timeStamps.imageTimeStamps(currFrameIdx-1)];
keyFrameToFrame = [startFrameIdx; currFrameIdx-1];  % Array to convert keyframe ids to frame id
biasToNode      = []; % Array to convert imageViewSet indices to factorGraph bias id
velToNode       = []; % Array to convert imageViewSet indices to factorGraph velocity id

loopCtr = 0; % Loop closure rolling window counter
scale   = 0; % Placeholder for IMU-Camera scale

% Factor graph optimization parameters
fgso = factorGraphSolverOptions;
fgso.MaxIterations           = 20;
fgso.VerbosityLevel          = 0;
fgso.FunctionTolerance       = 1e-5;
fgso.GradientTolerance       = 1e-5;
fgso.StepTolerance           = 1e-5;
fgso.TrustRegionStrategyType = 0;

while currFrameIdx < size(images,2)

    if ~isIMUInitialized && currKeyFrameId>50
        camPoses = poses(vSetKeyFrames);

        % Estimate the gravity rotation and pose scale that helps in transforming input camera poses to the 
        % local navigation reference frame of IMU
        [gDir,scale] = helperAlignIMUCamera(camPoses,uavData,imuParams,...
            keyFrameToFrame,currKeyFrameId);

        if scale>0.3

            % Compare the scaled poses to the ground truth
            alignementPlot = helperDrawScaledandRotatedTraj(uavData,camPoses,scale,startFrameIdx,keyFrameToFrame);

            % Scale estimation was successful
            isIMUInitialized = true;
    
            xyzPoints1 = mapPointSet.WorldPoints;
            xyzPoints2 = nodeState(fGraph,nodeIDs(fGraph,NodeType="POINT_XYZ"));
    
            % Transform and scale the input camera poses and XYZ points using the estimated gravity direction and pose scale.
            [updatedCamPoses,updatedXYZPoints1,updatedXYZPoints2] = helperTransformToIMU(...
                camPoses,xyzPoints1,xyzPoints2,gDir,scale); %two sets of points because we can't remove nodes from graphs yet
    
            % Update mapPointSet and vSetKeyFrames
            vSetKeyFrames = updateView(vSetKeyFrames, updatedCamPoses);
            mapPointSet = updateWorldPoints(mapPointSet, (1:mapPointSet.Count), updatedXYZPoints1);
    
            % Update the poses and 3D points values in the factor graph
            for i=1:height(updatedCamPoses)
                pose = updatedCamPoses.AbsolutePose(i); 
                nodeState(fGraph, viewToNode(i), double([pose.Translation rotm2quat(pose.R)]));
            end
            nodeState(fGraph,nodeIDs(fGraph,NodeType="POINT_XYZ"),updatedXYZPoints2);
    
            % Update the factor graph by linking the IMU mesurements
            veloId = generateNodeID(fGraph,length(viewToNode));
            biasId = generateNodeID(fGraph,length(viewToNode));
    
            velToNode  = [velToNode;veloId'];
            biasToNode = [biasToNode;biasId'];
    
            initVelPrior  = factorVelocity3Prior(velToNode(1));
            initBiasPrior = factorIMUBiasPrior(biasToNode(1));
    
            addFactor(fGraph,initVelPrior);
            addFactor(fGraph,initBiasPrior);
    
            nodeState(fGraph, biasToNode(1), initBias);
    
            for i=2:length(viewToNode)
                imuMeasurements = helperExtractIMUMeasurements(uavData, keyFrameToFrame(i-1), keyFrameToFrame(i));
                imuId = [viewToNode(i-1),velToNode(i-1),biasToNode(i-1), ...
                        viewToNode(i)  ,velToNode(i)  ,biasToNode(i)];
                fIMU = factorIMU(imuId,imuMeasurements.gyro,imuMeasurements.accel,imuParams,SensorTransform=uavData.camToIMUTransform);
                addFactor(fGraph,fIMU);
            end
    
            optimize(fGraph,fgso);
            
        end
    end

Tracking

To determine when to insert a new key frame, perform the tracking process for each frame using these steps:

Extract ORB features for each new frame and then match (using matchFeatures), with features in the last key frame that have known corresponding 3-D map points.
Estimate the camera pose with the Perspective-n-Point algorithm using estworldpose.
Using the camera pose, project the map points observed by the last key frame into the current frame and search for feature correspondences using matchFeaturesInRadius.
With 3-D to 2-D correspondence in the current frame, refine the camera pose by performing a motion-only bundle adjustment using factorGraph. Construct a factor graph using the steps in the previous section and then fix the 3D point nodes before proceeding to the optimization step.
Project the local map points into the current frame to search for more feature correspondences using matchFeaturesInRadius and refine the camera pose again using motion-only bundle adjustment.
Determine if the current frame is a new key frame. If the current frame is a key frame, continue to the Local Mapping process. Otherwise, start Tracking for the next frame.

If tracking is lost because not enough number of feature points could be matched, try inserting new key frames more frequently.

    % Extract ORB features from the current image
    currI = images{1, currFrameIdx};
    [currFeatures, currPoints] = helperDetectAndExtractFeatures(currI, scaleFactor, numLevels, ...
        numPoints, intrinsics);

    % Track the last key frame
    [currPose, mapPointsIdx, featureIdx] = helperTrackLastKeyFrameVI(mapPointSet, ...
        vSetKeyFrames.Views, currFeatures, currPoints, lastKeyFrameId, intrinsics, scaleFactor);

    % Track the local map and check if the current frame is a key frame.
    [localKeyFrameIds, currPose, mapPointsIdx, featureIdx, isKeyFrame] = ...
        helperTrackLocalMapVI(mapPointSet, vSetKeyFrames, mapPointsIdx, ...
        featureIdx, currPose, currFeatures, currPoints, intrinsics, scaleFactor, numLevels, ...
        isLastFrameKeyFrame, lastKeyFrameIdx, currFrameIdx, numSkipFrames, numPointsKeyFrame);

    % Visualize matched features
    updatePlot(featurePlot, currI, currPoints(featureIdx));
    
    if ~isKeyFrame
        isLastFrameKeyFrame = false;
        currFrameIdx        = currFrameIdx + 1;
        continue
    else
        isLastFrameKeyFrame = true;
        keyFrameToFrame = [keyFrameToFrame; currFrameIdx];
        encodedImages   = [encodedImages; encode(bofData.bag,currI,Verbose=false)];
        keyTimeStamps   = [keyTimeStamps; timeStamps.imageTimeStamps(currFrameIdx)];
        loopCtr         = loopCtr+1;
    end


    % Update current key frame ID
    currKeyFrameId  = currKeyFrameId + 1;

    % Extract IMU measurements between two KFs
    imuMeasurements = helperExtractIMUMeasurements(uavData, keyFrameToFrame(end-1), keyFrameToFrame(end));
    
    % Add the new key frame and IMU data
    [fGraph, viewToNode, velToNode, biasToNode, mapPointSet, vSetKeyFrames] = helperAddNewKeyFrame(...
        fGraph, viewToNode, pointToNode, velToNode, biasToNode, mapPointSet, vSetKeyFrames, imuMeasurements, ...
        currPose, currFeatures, currPoints, mapPointsIdx, featureIdx, localKeyFrameIds, intrinsics, ...
        uavData.camToIMUTransform, imuParams, isIMUInitialized);
    
    % Remove outlier map points that are observed in fewer than 3 key frames
    [mapPointSet,pointToNode] = helperCullRecentMapPoints(mapPointSet, pointToNode, mapPointsIdx, newPointIdx);
    
    % Create new map points by triangulation
    minNumMatches = 10;
    minParallax   = 3;
    [fGraph, pointToNode, mapPointSet, vSetKeyFrames, newPointIdx] = helperCreateNewMapPointsVI(...
        fGraph, pointToNode, viewToNode, mapPointSet, vSetKeyFrames, ...
        currKeyFrameId, intrinsics, scaleFactor, minNumMatches, minParallax);

Loop Closure

Loop candidates are recognized by comparing feature descriptors, which represent encoded images using a pre-trained bag of words. The detected loop candidate is validated when it is located near a previous pose, and the distance between the feature descriptors associated to both poses is smaller than a set threshold.

When a valid loop candidate is found, use estgeotform3d to compute the relative pose between the loop candidate frame and the current key frame. Then add the loop connection with the relative pose to the factor graph.

    loopClosureCheck = 350; % Before this frame, loop closure does not occur. 
    % Check loop closure after some key frames have been created  
    if currKeyFrameId > loopClosureCheck && loopCtr>10

        % Minimum number of feature matches of loop edges
        loopEdgeNumMatches = 100;

        % Detect possible loop closure key frame candidates
        potentialLoopCandidates = helperFindPotentialLC(vSetKeyFrames);

        if ~isempty(potentialLoopCandidates)
            [isDetected, validLoopCandidates] = helperCheckLoopClosureVI(encodedImages, potentialLoopCandidates);
            if isDetected 
                % Add loop closure connections
                [fGraph, isLoopClosed, mapPointSet, vSetKeyFrames] = helperAddLoopConnectionsVI(fGraph, viewToNode, pointToNode,...
                    mapPointSet, vSetKeyFrames, validLoopCandidates, currKeyFrameId, ...
                    currFeatures, currPoints, loopEdgeNumMatches, intrinsics);
                if isLoopClosed
                    loopCtr=0;
                end
            end
        end

    end

Local Mapping

Local mapping is performed for every key frame. When a new key frame is determined, add it and its corresponding IMU measurement to the factor graph and update to the mapPointSet and vSetKeyFrames objects. To ensure that mapPointSet contains as few outliers as possible, a valid map point must be observed in at least 3 key frames.

Create new map points by triangulating ORB feature points in the current key frame and its connected key frames. For each unmatched feature point in the current key frame, search for a match with other unmatched points in the connected key frames using matchFeatures. The local optimization of the factor graph refines the pose of the current key frame, the poses of connected key frames, and all the map points observed in these key frames.

    % Local bundle adjustment
    [refinedViews, dist] = connectedViews(vSetKeyFrames, currKeyFrameId, MaxDistance=2);
    refinedKeyFrameIds = refinedViews.ViewId;
    fixedViewIds = refinedKeyFrameIds(dist==2);
    fixedViewIds = fixedViewIds(1:min(10, numel(fixedViewIds)));

    refinedKeyFrameIds = [refinedKeyFrameIds; currKeyFrameId];

    [fGraph, solInfo] = helperLocalFactorGraphOptimization(fGraph, viewToNode, refinedKeyFrameIds, fixedViewIds);

    allOptNodes=solInfo.OptimizedNodeIDs;
    optPointNodes=setdiff(allOptNodes,viewToNode(refinedKeyFrameIds));
    [~,mapPointIdx]=intersect(pointToNode,optPointNodes,'stable');

    % Update world points
    mapPointSet = updateWorldPoints(mapPointSet,mapPointIdx,nodeState(fGraph,pointToNode(mapPointIdx)));

    % Update view poses
    for i=1:length(refinedKeyFrameIds)
        viewId = refinedKeyFrameIds(i);
        fgPose = nodeState(fGraph,viewToNode(viewId));
        absPose = rigidtform3d(quat2rotm(fgPose(4:7)),fgPose(1:3));
        vSetKeyFrames = updateView(vSetKeyFrames,viewId,absPose);
    end

    % Update view direction and depth
    mapPointSet = updateLimitsAndDirection(mapPointSet, mapPointIdx, vSetKeyFrames.Views);
    
    % Update representative view
    mapPointSet = updateRepresentativeView(mapPointSet, mapPointIdx, vSetKeyFrames.Views);

    % Visualize 3D world points and camera trajectory
    updatePlot(mapPlot, vSetKeyFrames, mapPointSet);
    
    % Update IDs and indices
    lastKeyFrameId  = currKeyFrameId;
    lastKeyFrameIdx = currFrameIdx;
    addedFramesIdx  = [addedFramesIdx; currFrameIdx]; %#ok<AGROW>
    currFrameIdx    = currFrameIdx + 1;

end % End of main loop

Loop edge added between keyframe: 49 and 361
Loop edge added between keyframe: 112 and 426
Loop edge updated between keyframe: 127 and 443
Loop edge added between keyframe: 143 and 462

Global Optimization

Once all the frames have been processed, we proceed to optimize the factor graph one last time by including all the keyframes, 3D points and IMU measurements.

% Optimize the poses
vSetKeyFramesOptim = vSetKeyFrames;
fGraphOptim = fGraph;

optimize(fGraphOptim,fgso);

% Update map points after optimizing the poses
for i=1:vSetKeyFrames.NumViews
    fgPose = nodeState(fGraphOptim,viewToNode(i));
    absPose = rigidtform3d(quat2rotm(fgPose(4:7)),fgPose(1:3));
    vSetKeyFramesOptim = updateView(vSetKeyFramesOptim,i,absPose);
end

mapPointSet = helperUpdateGlobalMap(mapPointSet,vSetKeyFrames,vSetKeyFramesOptim);

updatePlot(mapPlot, vSetKeyFrames, mapPointSet);

% Plot the optimized camera trajectory
plotOptimizedTrajectory(mapPlot, poses(vSetKeyFramesOptim))

% Update legend
showPlotLegend(mapPlot);

% Load ground truth 
gTruth     = uavData.gTruth;

% Plot the actual camera trajectory 
plotActualTrajectory(mapPlot, uavData, keyFrameToFrame, gDir);

% Show legend
showPlotLegend(mapPlot);

Comparison Against the Ground Truth

The downloaded data contains the gTruth array, which you can use to compare the obtained trajectory with the ground truth to qualitatively evaluate the accuracy of the SLAM pipeline. You can also calculate the root-mean-square-error (RMSE) and absolute-mean-error (AME) of trajectory estimates.

[rmse, ame] = helperGetAccuracy(poses(vSetKeyFramesOptim),uavData,keyTimeStamps,gDir);

    "AME for trajectory (m): "    "0.36203"

    "RMSE for trajectory (m): "    "0.26287"

In this example you learned how to utilize the factor graph to combine pose estimates from a camera with IMU measurements to obtain more accurate results.

Supporting Functions

Helper functions related to factor graph processing are included below.

Other functions are included in separate files. For more information on these functions, please refer to the Monocular Visual Simultaneous Localization and Mapping example.

helperAddLoopConnectionsVI adds connections between the current keyframe and the valid loop candidate.

helperCheckLoopClosureVI detects key frames that are loop candidates by retrieving visually similar images from the database.

helperCreateNewMapPointsVI creates new map points by triangulation.

helperTrackLastKeyFrameVI estimates the current camera pose by tracking the last key frame.

helperTrackLocalMapVI refines the current camera pose by tracking the local map.

helperDetectAndExtractFeatures detects and extracts and ORB features from the image.

helperComputeHomography computes homography and evaluates reconstruction.

helperComputeFundamentalMatrix computes fundamental matrix and evaluates reconstruction.

helperTriangulateTwoFrames triangulates two frames to initialize the map.

helperCullRecentMapPoints culls recently added map points.

helperUpdateGlobalMap updates 3-D locations of map points after pose graph optimization

helperAlignIMUCamera estimates the gravity rotation and pose scale to transform the input poses to the local navigation reference frame of IMU using IMU measurements and factor graph optimization.

helperTransformToIMU transforms and scales input poses and XYZ 3-D points according to specified gravity direction and pose scale.

helperLocalFactorGraphOptimization optimizes the factor graph using a set of local key frames.

helperInitFactorGraph initializes the factor graph using the results obtained from SfM.

function [fGraph, refinedAbsPoses, refinedPoints, viewToNode, pointToNode]  = helperInitFactorGraph(vSetKeyFrames, mapPointSet, intrinsics)

    K = intrinsics.K;
    camInfo = ((K(1,1)/1.5)^2)*eye(2);

    refinedKeyFrameIds = vSetKeyFrames.Views.ViewId';
    
    graphIds = [];
    graphMes = [];
    graphPts = [];
    
    fGraph = factorGraph;
    
    % Generate the necessary IDs for poses and 3D points
    zeroId      = generateNodeID(fGraph,1);
    viewToNode  = generateNodeID(fGraph,vSetKeyFrames.NumViews)';
    pointToNode = generateNodeID(fGraph,mapPointSet.Count)';
    
    mapPointIdx = [];
    
    % Extract the data from mapPointSet to initialize the factor graph
    for i=1:length(refinedKeyFrameIds)
      viewId=refinedKeyFrameIds(i);
      [pointIndices,featureIndices]=findWorldPointsInView(mapPointSet,viewId);
    
      pointIds = pointToNode(pointIndices);
      poseIds  = ones(length(pointIds),1)*viewToNode(viewId);
      graphIds = [graphIds; [poseIds pointIds]];
    
      ptsInView   = vSetKeyFrames.Views.Points{viewId,1}.Location;
      graphMes = [graphMes; ptsInView(featureIndices,:)];
    
      mapPointIdx = [mapPointIdx; pointIndices];
      graphPts = [graphPts; mapPointSet.WorldPoints(pointIndices,:)];
    end
    
    [mapPointIdx, uniqueIndex, ~] = unique(mapPointIdx);
    graphPts = graphPts(uniqueIndex,:);
    
    % Set up a pose prior using the initial position and fix it
    initPos = double([vSetKeyFrames.Views.AbsolutePose(1,1).Translation rotm2quat(vSetKeyFrames.Views.AbsolutePose(1,1).R)]);
    initPosePrior = factorPoseSE3Prior(viewToNode(1),Measurement=initPos);
    addFactor(fGraph,initPosePrior);
    fixNode(fGraph,viewToNode(1));
    
    % Set up a factorCameraSE3AndPointXYZ using the matches extracted above
    fCam=factorCameraSE3AndPointXYZ(graphIds, K, Measurement=graphMes, ...
    Information=camInfo);
    addFactor(fGraph,fCam);
    nodeState(fGraph,pointToNode(mapPointIdx),double(graphPts));
    
    % Set up a pose node using the current position
    pose=vSetKeyFrames.Views.AbsolutePose(2,1);
    pose=double([pose.Translation rotm2quat(pose.R)]);
    nodeState(fGraph,viewToNode(2),pose);
    
    % Fix a few 3D points for more stability
    ptsIDS = nodeIDs(fGraph,NodeType="POINT_XYZ");
    idx = randsample(height(ptsIDS'),int32(height(ptsIDS')*0.05));
    fixNode(fGraph,ptsIDS(idx));
    
    optimize(fGraph,factorGraphSolverOptions);
    
    fixNode(fGraph,ptsIDS(idx),false);
    
    % Extract the results of the factor graph optimization and re-package them
    poseIDs  = nodeIDs(fGraph,NodeType="POSE_SE3");
    fgposopt = nodeState(fGraph,poseIDs);
    
    initPose    = rigidtform3d(quat2rotm(fgposopt(1,4:7)),fgposopt(1,1:3));
    refinedPose = rigidtform3d(quat2rotm(fgposopt(2,4:7)),fgposopt(2,1:3));
    
    ViewId=[1;2];
    AbsolutePose=[initPose;refinedPose];
    
    refinedAbsPoses=table(ViewId,AbsolutePose);
    
    ptsIDS = nodeIDs(fGraph,NodeType="POINT_XYZ");
    refinedPoints = nodeState(fGraph,ptsIDS);
end

helperAddNewKeyFrame adds key frame data to the factor graph.

function [fGraph, viewToNode, velToNode, biasToNode, mapPoints, vSetKeyFrames] = helperAddNewKeyFrame(...
    fGraph, viewToNode, pointToNode, velToNode, biasToNode, mapPoints, vSetKeyFrames, imuMeasurements, ...
    cameraPose, currFeatures, currPoints, mapPointsIndices, featureIndices, keyFramesIndices, ...
    intrinsics, camToIMUTransform, imuParams, initIMU)

    N = length(mapPointsIndices);
    K = intrinsics.K;
    camInfo = ((K(1,1)/1.5)^2)*eye(2);
    
    viewId = vSetKeyFrames.Views.ViewId(end)+1;
    
    vSetKeyFrames = addView(vSetKeyFrames, viewId, cameraPose,...
        'Features', currFeatures.Features, ...
        'Points', currPoints);
    
    viewsAbsPoses = vSetKeyFrames.Views.AbsolutePose;
    
    % Generate the pose ID
    poseId = generateNodeID(fGraph,1);
    viewToNode = [viewToNode;poseId];
    
    for i = 1:numel(keyFramesIndices)
        localKeyFrameId = keyFramesIndices(i);
        [index3d, index2d] = findWorldPointsInView(mapPoints, localKeyFrameId);
        [~, index1, index2] = intersect(index3d, mapPointsIndices, 'stable');
        
        prePose   = viewsAbsPoses(localKeyFrameId);
        relPose = rigidtform3d(prePose.R' * cameraPose.R, ...
            (cameraPose.Translation-prePose.Translation)*prePose.R);
        
        if numel(index1) > 5
            vSetKeyFrames = addConnection(vSetKeyFrames, localKeyFrameId, viewId, relPose, ...
                Matches=[index2d(index1),featureIndices(index2)]);
    
            % Add a 3d pose factor to link both frames in the local map
            fgRelPose=double([relPose.Translation rotm2quat(relPose.R)]);
            fPose=factorTwoPoseSE3([viewToNode(localKeyFrameId) viewToNode(viewId)],Measurement=fgRelPose);
            addFactor(fGraph,fPose);
        end
    end
    
    mapPoints = addCorrespondences(mapPoints, viewId, mapPointsIndices, ...
        featureIndices);
    
    % Generate the IDs to be used in the factorCameraSE3AndPointXYZ
    ptsIds  = pointToNode(mapPointsIndices);
    poseIds = ones(N,1)*poseId;
    
    ptsInView = vSetKeyFrames.Views.Points{viewId,1}.Location;
    mesPts = ptsInView(featureIndices,:);
    
    % Add camera factor to the factor graph
    fCam = factorCameraSE3AndPointXYZ([poseIds ptsIds], K, Measurement=mesPts, ...
        Information=camInfo);
    addFactor(fGraph,fCam);
    
    % Add pose node to the factor graph
    pose = double([cameraPose.Translation rotm2quat(cameraPose.R)]);
    nodeState(fGraph,poseId,pose);
    
    % If the camera poses have been scaled properly, then add the IMU
    % factor to the factor graph
    if initIMU
        velId = generateNodeID(fGraph,1);
        velToNode=[velToNode;velId];
    
        biasId = generateNodeID(fGraph,1);
        biasToNode=[biasToNode;biasId];
    
        imuId = [viewToNode(end-1),velToNode(end-1),biasToNode(end-1), ...
                 viewToNode(end)  ,velToNode(end)  ,biasToNode(end)];
        fIMU = factorIMU(imuId,imuMeasurements.gyro,imuMeasurements.accel,imuParams,SensorTransform=camToIMUTransform);
        addFactor(fGraph,fIMU);
    end
end

References

[1] Dellaert, Frank, and Michael Kaess. Factor Graphs for Robot Perception. Now Publishers, 2017.

[2] Campos, Carlos, et al. “ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap SLAM.” IEEE Transactions on Robotics, vol. 37, no. 6, Dec. 2021, pp. 1874–90. DOI.org (Crossref), https://doi.org/10.1109/TRO.2021.3075644.