Image Retrieval with Bag of Visual Words

You can use the Computer Vision Toolbox™ functions to search by image, also known as a content-based image retrieval (CBIR) system. CBIR systems are used to retrieve images from a collection of images that are similar to a query image. The application of these types of systems can be found in many areas such as a web-based product search, surveillance, and visual place identification. First the system searches a collection of images to find the ones that are visually similar to a query image.

The retrieval system uses a bag of visual words, a collection of image descriptors, to represent your data set of images. Images are indexed to create a mapping of visual words. The index maps each visual word to their occurrences in the image set. A comparison between the query image and the index provides the images most similar to the query image. By using the CBIR system workflow, you can evaluate the accuracy for a known set of image search results.

Retrieval System Workflow

Create image set that represents image features for retrieval. Use imageDatastore to store the image data. Use a large number of images that represent various viewpoints of the object. A large and diverse number of images helps train the bag of visual words and increases the accuracy of the image search.
Type of feature. The indexImages function creates the bag of visual words using the speeded up robust features (SURF). For other types of features, you can use a custom extractor, and then use bagOfFeatures to create the bag of visual words. See the Create Search Index Using Custom Bag of Features example.
You can use the original imgSet or a different collection of images for the training set. To use a different collection, create the bag of visual words before creating the image index, using the bagOfFeatures function. The advantage of using the same set of images is that the visual vocabulary is tailored to the search set. The disadvantage of this approach is that the retrieval system must relearn the visual vocabulary to use on a drastically different set of images. With an independent set, the visual vocabulary is better able to handle the additions of new images into the search index.
Index the images. The indexImages function creates a search index that maps visual words to their occurrences in the image collection. When you create the bag of visual words using an independent or subset collection, include the bag as an input argument to indexImages. If you do not create an independent bag of visual words, then the function creates the bag based on the entire imgSet input collection. You can add and remove images directly to and from the image index using the addImages and removeImages methods.
Search data set for similar images. Use the retrieveImages function to search the image set for images which are similar to the query image. Use the NumResults property to control the number of results. For example, to return the top 10 similar images, set the ROI property to use a smaller region of a query image. A smaller region is useful for isolating a particular object in an image that you want to search for.

Evaluate Image Retrieval

Use the evaluateImageRetrieval function to evaluate image retrieval by using a query image with a known set of results. If the results are not what you expect, you can modify or augment image features by the bag of visual words. Examine the type of the features retrieved. The type of feature used for retrieval depends on the type of images within the collection. For example, if you are searching an image collection made up of scenes, such as beaches, cities, or highways, use a global image feature. A global image feature, such as a color histogram, captures the key elements of the entire scene. To find specific objects within the image collections, use local image features extracted around object keypoints instead.