fast_sorted_mask

fast masking of ordered vectors based on binary search
17 descargas
Actualizado 2 dic 2021

# fast_sorted_mask
**Bryce M. Henson**
Matlab code for fast masking/selection of ordered vectors based on binary search.
**Status:** This Code **is ready for use in other projects**. Testing is implemented and passing.

Selecting the elements of a vector that lie between some limits (herein *masking*, sometimes known as 'gating') is a widely used analytical tool (eg. particle physics).
The common approach to masking data involves comparing each element (n) to the upper and lower limit (herein *Brute compare*) has complexity O(n). The novel contribution of this code is a demonstration of a relatively simple approach that uses a binary search algorithm ( O(log(n)) ) on an ordered vector to achieve superior performance in two use cases.
1. Data that is already sorted ( O(log(n)) cf. brute O(n) ).
2. When there is a requirement to repeatedly (m times) mask the same data such that the inial cost of the sort is offset by the increased speed of the mask operation. (O(n·log(n)+m·log(n)) cf. O(n·m) )
Note that if m is small and you check that the data is ordered (eg issorted(data)) you have probably lost most of any potential speedup already.

There are two things that the user may want from this masking operation:
1. Returning the number of data points(or counts) in this mask (herein *return mask count*).
* This is where the search based algorithm really shines compared to the brute mask (as it just subtracts the upper and lower index while the brute compare must count up the logical vector, see details).
2. Return the values of the data that makes it through this mask. (herein *retun mask values*)
* This has a smaller speedup because copying a subset of a vector (even a contiguous block) is a substantially slower than the search.

The code here demonstrates the algorithm for both types (counting and subset) in native matlab and provides a number of tests in order to compare the performance.
For taking small slices of large (>1e6 elements) sorted vectors, a speedup of 1000x is demonstrated.

## Usage Examples
The brute compare implementation is very easy by using a logical mask vector:
```
mask=data>lower_lim & data<upper_lim
```
(If you are not familaiar with Logical indexing [read more here](https://blogs.mathworks.com/loren/2013/02/20/logical-indexing-multiple-conditions/) )
the number of counts (integer) may be extracted as
```
num_mask=sum(mask)
```
or the subset of data points (vector)
```
subset_mask=data(mask)
```
The equivelent (but faster) operation using fast_sorted_mask on ordered data is:
```
mask_idx=fast_sorted_mask(data,lower_lim,upper_lim);
num_mask=mask_idx(2)-mask_idx(1)+1;
subset_mask=data(mask_idx(1):mask_idx(2));
```
WARNING: the data vector MUST be sorted. See figures above for when it is still advantagous to sort unordered data and then use this approach.

Citar como

Bryce Henson (2024). fast_sorted_mask (https://github.com/brycehenson/fast_sorted_mask), GitHub. Recuperado .

Compatibilidad con la versión de MATLAB
Se creó con R2019a
Compatible con cualquier versión
Compatibilidad con las plataformas
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

No se pueden descargar versiones que utilicen la rama predeterminada de GitHub

Versión Publicado Notas de la versión
1.2.4

Fix logo labels

1.2.3

corrected description

1.2.2

added logo

1.2.1

include acknowledgment "Smart"/Silent Figure by Daniel Eaton

1.2

Para consultar o notificar algún problema sobre este complemento de GitHub, visite el repositorio de GitHub.
Para consultar o notificar algún problema sobre este complemento de GitHub, visite el repositorio de GitHub.