File Exchange

image thumbnail

fast_sorted_mask

version 1.2.4 (686 KB) by Bryce Henson
fast masking of ordered vectors based on binary search

0 Downloads

Updated 16 Apr 2019

GitHub view license on GitHub

# fast_sorted_mask
**Bryce M. Henson**
Matlab code for fast masking/selection of ordered vectors based on binary search.
**Status:** This Code **is ready for use in other projects**. Testing is implemented and passing.

Selecting the elements of a vector that lie between some limits (herein *masking*, sometimes known as 'gating') is a widely used analytical tool (eg. particle physics).
The common approach to masking data involves comparing each element (n) to the upper and lower limit (herein *Brute compare*) has complexity O(n). The novel contribution of this code is a demonstration of a relatively simple approach that uses a binary search algorithm ( O(log(n)) ) on an ordered vector to achieve superior performance in two use cases.
1. Data that is already sorted ( O(log(n)) cf. brute O(n) ).
2. When there is a requirement to repeatedly (m times) mask the same data such that the inial cost of the sort is offset by the increased speed of the mask operation. (O(n·log(n)+m·log(n)) cf. O(n·m) )
Note that if m is small and you check that the data is ordered (eg issorted(data)) you have probably lost most of any potential speedup already.

There are two things that the user may want from this masking operation:
1. Returning the number of data points(or counts) in this mask (herein *return mask count*).
* This is where the search based algorithm really shines compared to the brute mask (as it just subtracts the upper and lower index while the brute compare must count up the logical vector, see details).
2. Return the values of the data that makes it through this mask. (herein *retun mask values*)
* This has a smaller speedup because copying a subset of a vector (even a contiguous block) is a substantially slower than the search.

The code here demonstrates the algorithm for both types (counting and subset) in native matlab and provides a number of tests in order to compare the performance.
For taking small slices of large (>1e6 elements) sorted vectors, a speedup of 1000x is demonstrated.

## Usage Examples
The brute compare implementation is very easy by using a logical mask vector:
```
mask=data>lower_lim & data<upper_lim
```
(If you are not familaiar with Logical indexing [read more here](https://blogs.mathworks.com/loren/2013/02/20/logical-indexing-multiple-conditions/) )
the number of counts (integer) may be extracted as
```
num_mask=sum(mask)
```
or the subset of data points (vector)
```
subset_mask=data(mask)
```
The equivelent (but faster) operation using fast_sorted_mask on ordered data is:
```
mask_idx=fast_sorted_mask(data,lower_lim,upper_lim);
num_mask=mask_idx(2)-mask_idx(1)+1;
subset_mask=data(mask_idx(1):mask_idx(2));
```
WARNING: the data vector MUST be sorted. See figures above for when it is still advantagous to sort unordered data and then use this approach.

Cite As

Bryce Henson (2019). fast_sorted_mask (https://www.github.com/brycehenson/fast_sorted_mask), GitHub. Retrieved .

Comments and Ratings (0)

Updates

1.2.4

Fix logo labels

1.2.3

corrected description

1.2.2

added logo

1.2.1

include acknowledgment "Smart"/Silent Figure by Daniel Eaton

MATLAB Release Compatibility
Created with R2019a
Compatible with any release
Platform Compatibility
Windows macOS Linux