bioinfo.pipeline.block.SeqFilter
Description
A SeqFilter block enables you to filter sequences based on a
      specified criterion. 
Creation
Syntax
Description
b = bioinfo.pipeline.block.SeqFilterSeqFilter block.
b = bioinfo.pipeline.block.SeqFilter(options)options.
b = bioinfo.pipeline.block.SeqFilter(Name=Value)SeqFilterOptions object. This object is set as the value of the
            Options property of the block.
Note
The block always overwrites existing output files, unlike the  seqfilter  function.
Input Arguments
SeqFilter options, specified as a  SeqFilterOptions object.
Name-Value Arguments
Specify optional pairs of arguments as
      Name1=Value1,...,NameN=ValueN, where Name is
      the argument name and Value is the corresponding value.
      Name-value arguments must appear after other arguments, but the order of the
      pairs does not matter.
    
Note
The following list of arguments is a partial list. For the complete list, refer to
            the  properties  of
              SeqFilterOptions object.
Criterion to filter sequences, specified as one of the following options. Specify only one filtering criterion per function call.
- 'MaxNumberLowQualityBases'– applies a maximum threshold on the number of low-quality bases allowed.
- 'MaxPercentLowQualityBases'– applies a maximum threshold on the percentage of low-quality bases allowed.
- 'MeanQuality'– applies a minimum threshold on the average base quality across each sequence.
- 'MinLength'– applies a minimum threshold on the sequence length.
Use this name-value pair argument together with 'Threshold' to specify the appropriate threshold value. Depending on the filtering criterion, the corresponding value for 'Threshold' can be a scalar or two-element vector. See the 'Threshold' option for the default values. If you do not specify 'Threshold', then the function uses the default threshold value of the specified method. For each filtering criterion, the function uses the base quality encoding format specified by the 'Encoding' name-value pair argument.
Threshold value for the filtering criterion, specified as a scalar or vector. Use this name-value pair to define the threshold value for the filtering criterion specified by 'Method'.
Depending on the filtering criterion, the corresponding value for 'Threshold' can be a scalar or two-element vector. If you do not specify 'Threshold', then the function uses the default threshold value of the corresponding method. For each filtering criterion, the function uses the encoding format of the base quality specified by the 'Encoding' name-value pair argument.
| 'Method' | 'Threshold' | Default 'Threshold'value | 
|---|---|---|
| 'MaxNumberLowQualityBases' | Two-element vector [V1 V2]. V1 is a nonnegative integer that specifies the maximum number of low-quality bases allowed. V2 specifies the minimum base quality. Any base with quality less than V2 is considered a low-quality base. Any sequence containing a number of low-quality bases greater than V1 is filtered out and not saved in the output file. | [0 10] | 
| 'MaxPercentLowQualityBases' | Two-element vector [V1 V2]. V1 is a scalar between 0 and 100 that specifies the maximum percentage of low-quality bases allowed. V2 specifies the minimum base quality. Any base with quality less than V2 is considered a low-quality base. Any sequence containing a percentage of low-quality bases greater than V1 is filtered out and not saved in the output file. | [0 10] | 
| 'MeanQuality' | Positive scalar that specifies the minimum threshold on the average base quality across each sequence. Any sequence with average base quality less than this value is filtered out. | 0 | 
| 'MinLength' | Nonnegative integer that specifies the minimum threshold on the sequence length allowed. Any sequence with length less than this value is filtered out. | 1 | 
Properties
Function to handle errors from the run
    method of the block, specified as a function handle. The handle  specifies the function to call
    if the run method encounters an error within a pipeline. For the pipeline to continue after a
    block fails, ErrorHandler must return a structure that is compatible with
    the output ports of the block. The error handling function is called with the following two inputs:
- Structure with these fields: - Field - Description - identifier - Identifier of the error that occurred - message - Text of the error message - index - Linear index indicating which block process failed in the parallel run. By default, the index is 1 because there is only one run per block. For details on how block inputs can be split across different dimensions for multiple run calls, see Bioinformatics Pipeline SplitDimension. 
- Input structure passed to the - runmethod when it fails
Data Types: function_handle
This property is read-only.
Input ports of the block, specified as a structure. The field
    names of the structure are the names of the block input ports, and the field values are bioinfo.pipeline.Input objects. These objects describe the input port behaviors.
    The input port names are the expected field names of the input structure that you pass to the
    block run method.
The SeqFilter block Inputs structure has the
            following field: 
- FASTQFiles— Names of FASTQ-formatted files with sequence and quality information. This input is a required input that must be satisfied. The default value is a- bioinfo.pipeline.datatypes.Unsetobject, which means that the input value is not set yet.
Data Types: struct
This property is read-only.
Output ports of the block, specified as a structure. The field
    names of the structure are the names of the block output ports, and the field values are bioinfo.pipeline.Output objects. These objects describe the output port behaviors.
    The field names of the output structure returned by the block run method
    are the same as the output port names.
The SeqFilter block Outputs structure has the
            following fields: 
- FilteredFASTQFiles— Output file names. By default, the name of each output file consists of the input file name followed by the output suffix (- '_filtered').- Tip - To see the actual location of these files, first get the results of the block. Then use the - unwrapmethod as shown in this example.
- NumFilteredIn— Number of sequences selected from each input file, returned as a scalar or an n-by-1 vector where n is the number of input files. If there are multiple input files, the order in- NumFilteredIncorresponds to the order of the input files.
- NumFilteredOut— Number of sequences excluded from each input file, returned as a scalar or an n-by-1 vector where n is the number of input files. If there are multiple input files, the order in- NumFilteredOutcorresponds to the order of the input files.
Data Types: struct
SeqFilter options, specified as a SeqFilterOptions object. The default value is a default
              SeqFilterOptions object.
Object Functions
| compile | Perform block-specific additional checks and validations | 
| copy | Copy array of handle objects | 
| emptyInputs | Create input structure for use with runmethod | 
| eval | Evaluate block object | 
| run | Run block object | 
Examples
Use a SeqFilter block to filter out sequences with
          low-quality bases, where a base is considered low-quality if its quality score is less
          than 15 (default).
import bioinfo.pipeline.block.* import bioinfo.pipeline.Pipeline FC = FileChooser(which("SRR005164_1_50.fastq")); SF = SeqFilter; P = Pipeline; addBlock(P,[FC,SF]); connect(P,FC,SF,["Files","FASTQFiles"]); run(P); R = results(P,SF)
R = 
  struct with fields:
    FilteredFASTQFiles: [1×1 bioinfo.pipeline.datatypes.File]
         NumFilteredIn: 3
        NumFilteredOut: 47Call unwrap on FilteredFASTQFiles to see the
            location of the output file.
unwrap(R.FilteredFASTQFiles)
ans = 
    "C:\PipelineResults\SeqFilter_1\1\SRR005164_1_50_filtered.fastq"Import the Pipeline and block objects needed for the example.
import bioinfo.pipeline.Pipeline import bioinfo.pipeline.block.*
Create a pipeline.
qcpipeline = Pipeline;
Select an input FASTQ file using  a FileChooser block.
fastqfile = FileChooser(which("SRR005164_1_50.fastq"));Create a SeqFilter block.
sequencefilter = SeqFilter;
Define the filtering threshold value. Specifically, filter out sequences with a total of more than 10 low-quality bases, where a base is considered a low-quality base if its quality score is less than 20.
sequencefilter.Options.Threshold = [10 20];
Add the blocks to the pipeline.
addBlock(qcpipeline,[fastqfile,sequencefilter]);
Connect the output of the first block to the input of the second block. To do so, you need to first check the input and output port names of the corresponding blocks.
View the Outputs (port of the first block) and Inputs (port of the second block).
fastqfile.Outputs
ans = struct with fields:
    Files: [1×1 bioinfo.pipeline.Output]
sequencefilter.Inputs
ans = struct with fields:
    FASTQFiles: [1×1 bioinfo.pipeline.Input]
Connect the Files output port of the fastqfile block to the FASTQFiles port of sequencefilter block.
connect(qcpipeline,fastqfile,sequencefilter,["Files","FASTQFiles"]);
Next, create a UserFunction block that calls the seqqcplot function to plot the quality data of the filtered sequence data. In this case, inputFile is the required argument for the seqqcplot function. The required argument name can be anything as long as it is a valid variable name.
qcplot = UserFunction("seqqcplot",RequiredArguments="inputFile",OutputArguments="figureHandle");
Alternatively, you can also use dot notation to set up your UserFunction block.
qcplot = UserFunction; qcplot.RequiredArguments = "inputFile"; qcplot.Function = "seqqcplot"; qcplot.OutputArguments = "figureHandle";
Add the block.
addBlock(qcpipeline,qcplot);
Check the port names of sequencefilter block and qcplot block.
sequencefilter.Outputs
ans = struct with fields:
    FilteredFASTQFiles: [1×1 bioinfo.pipeline.Output]
         NumFilteredIn: [1×1 bioinfo.pipeline.Output]
        NumFilteredOut: [1×1 bioinfo.pipeline.Output]
qcplot.Inputs
ans = struct with fields:
    inputFile: [1×1 bioinfo.pipeline.Input]
Connect the FilteredFASTQFiles port of the sequencefilter block to the inputFile port of the qcplot block.
connect(qcpipeline,sequencefilter,qcplot,["FilteredFASTQFiles","inputFile"]);
Run the pipeline to plot the sequence quality data.
run(qcpipeline);

Version History
Introduced in R2023a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Seleccione un país/idioma
Seleccione un país/idioma para obtener contenido traducido, si está disponible, y ver eventos y ofertas de productos y servicios locales. Según su ubicación geográfica, recomendamos que seleccione: .
También puede seleccionar uno de estos países/idiomas:
Cómo obtener el mejor rendimiento
Seleccione China (en idioma chino o inglés) para obtener el mejor rendimiento. Los sitios web de otros países no están optimizados para ser accedidos desde su ubicación geográfica.
América
- América Latina (Español)
- Canada (English)
- United States (English)
Europa
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
