Main Content

inflated3dVideoClassifier

Inflated-3D (I3D) video classifier. Requires Computer Vision Toolbox Model for Inflated-3D Video Classification

Since R2021b

Description

The inflated3dVideoClassifier object is an Inflated-3D (I3D) video classifier pretrained on the Kinetics-400 data set. You can use the pretrained video classifier to classify 400 human actions, such as running, walking, and shaking hands. The I3D classifier model contains two subnetworks: the video network and the optical flow network. Both of these networks are trained on Kinetics-400 with RGB data and optical flow data respectively.

Creation

Description

i3d = inflated3dVideoClassifier returns the I3D video classifier pretrained on the Kinetics-400 dataset.

i3d = inflated3dVideoClassifier(classifierName,classes) configures the pretrained Inflated 3D (I3D) video classifier for transfer learning on a new set of classes, classes, using one of two pretrained classifiers, specified by classifierName.

example

i3d = inflated3dVideoClassifier(___,Name=Value) sets properties using name-value arguments in addition to the input arguments from the previous syntax. For example, i3d = inflated3dVideoClassifier("googlenet-video","wavingHello","clapping",InputSize=[224,224,3,64]) sets the input size of the network to 64 frames of 224-by-224 pixels with 3 channels. You can specify multiple name-value arguments.

Note

This object requires the Computer Vision Toolbox™ Model for Inflated-3D Video Classification. You can install the Computer Vision Toolbox Model for Inflated-3D Video Classification from Add-On Explorer. For more information about installing add-ons, see Get and Manage Add-Ons. To use this object, you must have a license for the Deep Learning Toolbox™.

Input Arguments

expand all

Classifier name, specified as "googlenet-video" or "googlenet-video-flow".

ClassifierDescription
"googlenet-video"GoogLeNet-based I3D model pretrained on the Kinetics-400 video data for transfer learning.
"googlenet-video-flow"GoogLeNet-based I3D model pretrained on the Kinetics-400 video and optical flow data for transfer learning. During training and inference, both video and optical flow data are used for classification.

Properties

expand all

Configure Classifier Properties

This property is read-only.

Size of the video classifier network, specified as a four-element row vector in the form [H,W,C,T], where H and W represent the height and width respectively, C represents the number of channels, and T represents the number of frames for the video subnetwork.

The input size of the flow subnetwork is equal in height, width, and number of frames, but the number of channels is fixed to 2.

Typical values for the number of frames are 8, 16, 32, or 64. Increase the number of frames to capture the temporal nature of activities when training the classifier. When you are using optical flow data, the number of channels must equal 2, which correspond to the x- and y-components of velocity.

This property is read-only.

Normalization statistics for the video and optical flow data, specified as a structure with field names Video and OpticalFlow, which are also structures with field names, Min, Max, Mean, and StandardDeviation. The Min and Max field values define the minimum and maximum values for rescaling the video and optical flow data. The Mean, and StandardDeviation values define the mean and standard deviation for input normalization. All field values must be specified as a row vector of size equal to the number of channels for the video input data. When you are using optical flow data, the number of channels must equal 2, which correspond to the x- and y components of velocity.

The default structure contains:

  • A Video field, which contains the field Min set to [0,0,0], and the field Max set to [255,255,255].

  • Empty OpticalFlow, Mean, and StandardDeviation field values.

For a video input, the data is rescaled between -1 and 1 using the Min and Max field values. For an optical flow input, the data is rescaled between -1 and 1 using computed minimum and maximum values from the input data.

Note

When the Min and Max field values are not empty, the object first rescales the input data between -1 and 1. Then, if the Mean, and StandardDeviation field values are not empty, the object normalizes the rescaled values by subtracting the mean and dividing by the standard deviation.

An example using this property:

stats.Video = struct(Min=[0,0,0],Max=[255,255,255], ...
Mean=[],StandardDeviation=[]);
stats.OpticalFlow = struct(Min=[-20,-20],Max=[20,20] ,...
Mean=[],StandardDeviation=[]);
i3d = inflated3dVideoClassifier('googlenet-video-flow',["waving","clapping"],InputNormalizationStatistics=stats);

Name of the trained video classifier, specified as a string scalar.

This property is read-only.

Classes that the video classifier is configured to train or classify, specified as a vector of strings or a cell array of character vectors. For example:

classes = ['kiss','laugh','pick','pour','pushup'];

Training Properties

Learnable parameters for the video subnetwork of the I3D video classifier, specified as a table with three columns.

  • Layer — Layer name, specified as a string scalar.

  • Parameter — Parameter name, specified as a string scalar.

  • Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network learnable parameters contain the features learned by the network. For example, the weights of convolution and fully connected layers.

State of the nonlearnable parameters for the video subnetwork of the I3D video classifier, specified as a table with three columns.

  • Layer — Layer name, specified as a string scalar.

  • Parameter — Parameter name, specified as a string scalar.

  • Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network state contains information remembered by the network between iterations. For example, the state of LSTM and batch normalization layers. During training or inference, you can update the network state using the output of the forward and predict functions.

Learnable parameters for the optical flow subnetwork of the I3D video classifier, specified as a table with three columns. Network learnable parameters, specified as a table with three columns:

  • Layer — Layer name, specified as a string scalar.

  • Parameter — Parameter name, specified as a string scalar.

  • Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network learnable parameters contain the features learned by the network. For example, the weights of convolution and fully connected layers.

State of the nonlearnable parameters for the video subnetwork of the I3D video classifier, specified as a table with three columns. Network learnable parameters, specified as a table with three columns:

  • Layer — Layer name, specified as a string scalar.

  • Parameter — Parameter name, specified as a string scalar.

  • Value — Parameter value, specified as a dlarray (Deep Learning Toolbox) object.

The network learnable parameters contain the features learned by the network. For example, the weights of convolution and fully connected layers.

Streaming Video Classification Properties

This property is read-only.

Video sequence used to update and classify sequences for streaming classification, specified as a 4-D numeric array. Each vector in the array is of the form [H,W,C,T], where H and W represent the height and width respectively, C represents the number of channels, and T represents the number of frames, for the video subnetwork. The updateSequence and classifySequence object functions use the video sequence specified by the VideoSequence property.

This property is read-only.

Optical flow sequence used to update and classify sequences for streaming classification, specified as a 4-D numeric array. Each vector in the array is of the form (H,W,C,T), where H and W represent the height and width respectively, C represents the number of channels, and T represents the number of frames, for the optical flow subnetwork. The updateSequence and classifySequence object functions use the optical flow sequence specified by the OpticalFlowSequence.

Object Functions

expand all

classifyVideoFileClassify a video file
resetSequenceReset video and optical flow sequence properties for streaming video classification
updateSequenceUpdate video or optical flow sequence for classification
classifySequenceClassify video and optical flow sequence
forwardCompute video classifier outputs for training
predictCompute video classifier predictions

Examples

collapse all

This example shows how to use classifyVideoFile to classify a video using an Inflated 3D video classifier.

Load a pretrained Inflated-3D video network.

i3d = inflated3dVideoClassifier();

Specify the video file name to classify.

videoFilename = 'visiontraffic.avi';

Classify the video using the video classifier.

label = classifyVideoFile(i3d, videoFilename);

Note that the classifier is not fine-tuned to compute the correct predictions for visiontraffic.avi, therefore, the predicted label will not be correct. You must train the classifier for optimal performance on your video data.

Version History

Introduced in R2021b