bwamem

Map sequence reads to reference genome using BWA

Syntax

bwamem(indexBaseName,reads1,reads2,outputFileName)

bwamem(___,options)

bwamem(___,Name,Value)

Description

bwamem(indexBaseName,reads1,reads2,outputFileName) maps the sequencing reads from reads1 and reads2 against the reference sequence and writes the results to the output file outputFileName. The input indexBaseName represents the base name (prefix) of the reference index files [1][2].

bwamem requires the BWA Support Package for Bioinformatics Toolbox™. If the support package is not installed, then the function provides a download link. For details, see Bioinformatics Toolbox Software Support Packages.

example

bwamem(___,options) uses the additional options specified by options. Specify these options after all other input arguments.

example

bwamem(___,Name,Value) uses additional options specified by one or more name-value pair arguments. For example, 'BandWidth',90 sets the maximum allowable gap length to 90.

example

Examples

collapse all

Align Reads to Reference Sequence Using BWA

This example uses:

Open Live Script

This example requires the BWA Support Package for Bioinformatics Toolbox™. If the support package is not installed, the software provides a download link. For details, see Bioinformatics Toolbox Software Support Packages.

Build a set of index files for the Drosophila genome. This example uses the reference sequence Dmel_chr4.fa, provided with the toolbox. The 'Prefix' argument lets you define the prefix of the output index files. You can also include the file path information. For this example, define the prefix as Dmel_chr4 and save the index files in the current directory.

bwaindex('Dmel_chr4.fa','Prefix','./Dmel_chr4');

As an alternative to specifying name-value pair arguments, you can use the BWAIndexOptions object to specify the indexing options.

indexOpt = BWAIndexOptions;
indexOpt.Prefix = './Dmel_chr4';
indexOpt.Algorithm = 'bwtsw';
bwaindex('Dmel_chr4.fa',indexOpt);

Once the index files are ready, map the read sequences to the reference using bwamem. Two pair-end read input files are already provided with the toolbox. Using name-value pair arguments, you can specify different alignment options, such as the number of parallel threads to use.

bwamem('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4.sam','NumThreads',4);

Alternatively, you can use BWAMEMoptions to specify the alignment options.

alignOpt = BWAMEMOptions;
alignOpt.NumThreads = 4;
bwamem('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4.sam',alignOpt)

Input Arguments

collapse all

`indexBaseName` — Base name of reference index files
character vector | string

Base name (prefix) of the reference index files, specified as a character vector or string. For example, the base name of an index file 'Dmel_chr4.bwt' is 'Dmel_chr4'.

The index files are in the AMB, ANN, BWT, PAC, and SA file formats.

Example: 'Dmel_chr4'

Data Types: char | string

`reads1` — Name of file with first mate reads or single-end reads
character vector | string

Name of the file with the first mate reads or single-end reads, specified as a character vector or string.

For paired-end data, the sequences in reads1 must correspond read-for-read to sequences in reads2.

Example: 'SRR6008575_10k_1.fq'

Data Types: char | string

`reads2` — Name of file with second mate reads
character vector | string | `[]`

Name of the file with the second mate reads, specified as a character vector or string.

Specify reads2 as empty ([], '', or "") if the data consists of single-end reads only.

Example: 'SRR6008575_10k_2.fq'

Data Types: char | string

`outputFileName` — Output file name
character vector | string

Output file name, specified as a character vector or string. This file contains the mapping results.

Example: 'SRR6008575_10k_chr4.sam'

Data Types: char | string

`options` — Additional options for mapping
`BWAMEMOptions` object | character vector | string

Additional options for mapping, specified as a BWAMEMOptions object, character vector, or string. The character vector or string must be in the bwa mem native syntax (prefixed by a dash). If you specify a BWAMEMOptions object, the software uses only those properties that are set or modified.

Data Types: char | string

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: bwamem(indexbasename,reads1,reads2,outputfile,'BandWidth',90) sets 90 as the maximum allowable gap.

`AlternativeHitsThreshold` — Threshold for determining which hits receive XA tag in output SAM file
`[5 200]` (default) | nonnegative integer | two-element numeric vector

Threshold for determining which hits receive an XA tag in the output SAM file, specified as a nonnegative integer n or two-element numeric vector [n m], where n and m must be nonnegative integers.

If a read has less than n hits with a score greater than 80% of the best score for that read, all hits receive an XA tag in the output SAM file.

When you also specify m, the software returns up to m hits if the hit list contains a hit to an ALT contig.

Data Types: double

`AppendReadCommentsToSAM` — Flag to append FASTA or FASTQ comments to output SAM file
`false` (default) | `true`

Flag to append FASTA or FASTQ comments to the output SAM file, specified as true or false. The comments appear as text after a space in the file header.

Data Types: logical

`BandWidth` — Maximum allowable gap length
`100` (default) | nonnegative integer

Maximum allowable gap length, specified as a nonnegative integer.

Data Types: double

`BasesPerBatch` — Number of bases per batch
`[]` (default) | positive integer

Number of bases per batch, specified as a positive integer.

If you do not specify BasesPerBatch, the software uses 1e7 * NumThreads by default. NumThreads is the number of parallel threads available when you run bwamem.

If you specify BasesPerBatch, the software uses that exact number and does not multiply the number by NumThreads. This rule applies regardless of whether you explicitly set NumThreads or not.

However, if you specify NumThreads but not BasesPerBatch, the software uses 1e7 * NumThreads.

The batch size is proportional to the number of parallel threads in use. Using different numbers of threads might produce different outputs. Specifying this option helps with the reproducibility of results.

Data Types: double

`ClipPenalty` — Penalty for clipped alignments
`[5 5]` (default) | nonnegative integer | two-element numeric vector

Penalty for clipped alignments, specified as a nonnegative integer or two-element numeric vector. Each read has the best score for an alignment that spans the length of the read. The software does not clip alignments that do not span the length of the read and do not score higher than the sum of ClipPenalty and the best score of the full-length read.

Specify a nonnegative integer to set the same penalty for both 5' and 3' clipping.

Specify a two-element numeric vector to set different penalties for 5' and 3' clipping.

Data Types: double

`DropChainFraction` — Threshold for dropping chains relative to longest overlapping chain
`0.5` (default) | scalar between `0` and `1`

Threshold for dropping chains relative to the longest overlapping chain, specified as a scalar between 0 and 1.

The software drops chains that are shorter than DropChainFraction * (longest overlapping chain length).

Data Types: double

`DropChainLength` — Minimum number of bases
`0` (default) | nonnegative integer

Minimum number of bases in seeds forming a chain, specified as a nonnegative integer. The software drops chains shorter than DropChainLength.

Data Types: double

`ExtraCommand` — Additional commands
`""` (default) | character vector | string

Additional commands, specified as a character vector or string.

The commands must be in the native syntax (prefixed by one or two dashes). Use this option to apply undocumented flags and flags without corresponding MATLAB^® properties.

Example: 'ExtraCommand','-y'

Data Types: char | string

`FastaHeaderToXR` — Flag to include FASTA header in XR tag
`false` (default) | `true`

Flag to include the FASTA header in the XR tag, specified as true or false.

Data Types: logical

`GapExtensionPenalty` — Gap extension penalty
`[1 1]` (default) | nonnegative integer | two-element numeric vector

Gap extension penalty, specified as a nonnegative integer or two-element numeric vector [n m]. n is the penalty for extending a deletion. m is the penalty for extending an insertion.

If you specify a nonnegative integer, the software uses it as the penalty for extending a deletion or an insertion.

Data Types: double

`GapOpenPenalty` — Gap opening penalty
`[6 6]` (default) | nonnegative integer | two-element numeric vector

Gap opening penalty, specified as a nonnegative integer or two-element numeric vector [n m]. n is the penalty for opening a deletion. m is the penalty for opening an insertion.

If you specify a nonnegative integer, the software uses it as the penalty for opening a deletion or an insertion.

Data Types: double

`HeaderInsert` — Text to insert into header of output SAM file
`[0x0 string]` (default) | character vector | string

Text to insert into the header of the output SAM file, specified as a character vector or string.

Use one of the following:

Character vector or string that starts with @ to insert the exact text to the SAM header
Character vector or string that is a file name, where each line in the file must start with @

Data Types: char | string

`IncludeAll` — Flag to apply all available options
`false` (default) | `true`

Flag to include all available options with the corresponding default values when converting to the original syntax, specified as true or false.

The original (native) syntax is prefixed by one or two dashes. By default, the function converts only the specified options. If the value is true, the software converts all available options, with default values for unspecified options, to the original syntax.

Note

If you set IncludeAll to true, the software converts all available properties, using default values for unspecified properties. The only exception is when the default value of a property is NaN, Inf, [], '', or "". In this case, the software does not translate the corresponding property.

Data Types: logical

`InsertSizeStatistics` — Insert size distribution parameters
`[1x0 double]` (default) | four-element numeric array

Insert size distribution parameters, specified as a four-element numeric array [mean std max min].

mean is the mean insert size.
std is the standard deviation.
max is the maximum insert size.
min is the minimum insert size.

If you specify n elements array, where n is less than four, the elements specify the first n distribution parameters. By default, the software infers unspecified parameters from data.

Data Types: double

`MarkShortSplitsSecond` — Flag to mark shorter split hits as secondary
`false` (default) | `true`

Flag to mark the shorter split hits as secondary in the SAM flag, specified as true or false.

Data Types: logical

`MarkSmallestCoordinatePrimary` — Flag to mark segment with smallest coordinates as primary
`false` (default) | `true`

Flag to mark the segment with the smallest coordinates as primary when the alignment is split, specified as true or false.

Data Types: logical

`MatchScore` — Score for sequence match
`1` (default) | nonnegative integer

Score for a sequence match, specified as a nonegative integer.

Data Types: double

`MaxMemOccurrence` — Maximum number of MEM occurrences
`500` (default) | positive integer

Maximum number of MEM (maximal exact match) occurrences for each read before it is discarded, specified as a positive integer.

Data Types: double

`MaxRoundsMateRescue` — Maximum number of rounds of mate rescue
`50` (default) | nonnegative integer

Maximum number of rounds of mate rescue for each read, specified as a nonnegative integer. The software uses the Smith-Waterman (SW) algorithm for the mate rescue.

Data Types: double

`MinSeedLength` — Minimum seed length
`19` (default) | positive integer

Minimum seed length, specified as a positive integer. The software discards any matches shorter than the minimum seed length.

Data Types: double

`MismatchPenalty` — Penalty for alignment mismatch
`4` (default) | nonnegative integer

Penalty for an alignment mismatch, specified as a nonnegative integer.

Data Types: double

`NumThreads` — Number of parallel threads
`1` (default) | positive integer

Number of parallel threads to use, specified as a positive integer. Threads are run on separate processors or cores. Increasing the number of threads generally improves the runtime significantly, but increases the memory footprint.

Data Types: double

`OutputAllAlignments` — Flag to return all found alignments
`false` (default) | `true`

Flag to return all found alignments including unpaired and paired-end reads, specified as true or false. If the value is true, the software returns all found alignments and marks them as secondary alignments.

Data Types: logical

`OutputScoreThreshold` — Score threshold for returning alignments
`30` (default) | positive integer

Score threshold for returning alignments, specified as a positive integer. Specify the minimum score that alignments must have to be in the output file.

Data Types: double

`ReadGroupLine` — Text to insert into read group header
`[0x0 string]` (default) | character vector | string

Text to insert into the read group (RG) header line in the output file, specified as a character vector or string.

Data Types: char | string

`ReadType` — Type of reads to align
`[0x0 string]` (default) | `'pacbio` | `'ont2d` | `'intractg'`

Type of reads to align, specified as a character vector or string. Each read type has different default parameter values to use during alignment. You can overwrite any parameters. Valid options are:

'pacbio' — PacBio reads
'ont2d' — Oxford nanopore 2D reads
'intractg' — Intra-species contigs

The parameter values are as follows.

'pacbio'

MinSeedLength = 17
DropChainLength = 40
SeedSplitRatio = 10
MatchScore = 1
MismatchPenalty = 1
GapOpenPenalty = 1
GapExtensionPenalty = 1
ClipPenalty = 0

The equivalent native syntax is '-k17 -W40 -r10 -A1 -B1 -O1 -E1 -L0'.

'ont2d'

MinSeedLength = 14
DropChainLength = 20
SeedSplitRatio = 10
MatchScore = 1
MismatchPenalty = 1
GapOpenPenalty = 1
GapExtensionPenalty = 1
ClipPenalty = 0

The equivalent native syntax is '-k14 -W20 -r10 -A1 -B1 -O1 -E1 -L0'.

'intractg'

MismatchPenalty = 9
GapOpenPenalty = 16
ClipPenalty = 5

The equivalent native syntax is '-B9 -O16 -L5'.

Data Types: char | string

`ReduceSupplementaryMAPQ` — Flag to reduce mapping quality (MAPQ) score of supplementary alignments
`true` (default) | `false`

Flag to reduce the mapping quality (MAPQ) score of supplementary alignments, specified as true or false.

Data Types: logical

`SeedSplitRatio` — Threshold for reseeding
`1.50` (default) | nonnegative integer

Threshold for reseeding, specified as a nonnegative integer. Specify the seed length at which reseeding happens relative to the minimum seed length MinSeedLength. Specifically, if a MEM (maximal exact match) is longer than MinSeedLength * SeedSplitRatio, reseeding occurs.

Data Types: double

`SkipMateRescue` — Flag to skip mate rescue
`false` (default) | `true`

Flag to skip mate rescue, specified as true or false. Mate rescue uses the Smith-Waterman (SW) algorithm to align unmapped reads with mates that are properly aligned.

Data Types: logical

`SkipPairing` — Flag to skip read pairing
`false` (default) | `true`

Flag to skip read pairing, specified as true or false. If true, for paired-end reads, the software uses the Smith-Waterman (SW) algorithm to rescue missing hits only and does not try to find hits that fit a proper pair.

Data Types: logical

`SmartPairing` — Flag to perform smart pairing
`false` (default) | `true`

Flag to perform smart pairing, specified as true or false. If the value is true, the software pairs adjacent reads that are in the same file and have the same name. Such FASTQ files are also known as interleaved files.

Data Types: logical

`SoftClipSupplementary` — Flag to soft clip supplemental alignments
`false` (default) | `true`

Flag to soft clip supplemental alignments, specified as true or false. If the value is true, the software soft clips both supplemental alignments and a primary alignment.

The default value is false, which means that the software soft clips the primary alignment and hard clips the supplemental alignments.

Data Types: logical

`TreatAltAsPrimary` — Flag to treat ALT contigs as part of primary assembly
`false` (default) | `true`

Flag to treat ALT contigs as part of the primary assembly, specified as true or false.

Data Types: logical

`UnpairedReadPenalty` — Penalty for mapping read pairs as unpaired
`17` (default) | nonnegative integer

Penalty for mapping read pairs as unpaired, specified as a nonnegative integer.

The alignment score for a paired read pair is read1 score + read2 score - insert penalty. The alignment score for an unpaired read pair is read1 score + read2 score - UnpairedReadPenalty. The software compares these two scores to force read pairing. A larger UnpairedReadPenalty value leads to a more aggressive read pairing.

Data Types: double

`Verbosity` — Verbosity level of information printed
`0` (default) | nonnegative integer

Verbosity level of information printed to the MATLAB command line while the software is running, specified as a nonnegative integer. Valid options are:

0 — For disabling all outputs to the command line.
1 — For printing error messages.
2 — For printing warning and error messages.
3 — For printing all messages.
4 — For debugging purposes only.

Data Types: double

`ZDropOff` — Cutoff for Smith-Waterman extension
`100` (default) | nonnegative integer

Cutoff for the Smith-Waterman (SW) extension, specified as a nonnegative integer. The software uses the following expression:

$| i - j | * M a t c h S c o r e + Z D r o p O f f$ , where i and j are the current positions of the query and reference, respectively. When the difference between the best score and current extension score is larger than this expression value, the software terminates the SW extension.

Data Types: double

References

[1] Li, Heng, and Richard Durbin. “Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform.” Bioinformatics 25, no. 14 (July 15, 2009): 1754–60. https://doi.org/10.1093/bioinformatics/btp324.

[2] Li, Heng, and Richard Durbin. “Fast and Accurate Long-Read Alignment with Burrows–Wheeler Transform.” Bioinformatics 26, no. 5 (March 1, 2010): 589–95. https://doi.org/10.1093/bioinformatics/btp698.

Version History

Introduced in R2020b

bwamem

Syntax

Description

Examples

Align Reads to Reference Sequence Using BWA

Input Arguments

indexBaseName — Base name of reference index files character vector | string

reads1 — Name of file with first mate reads or single-end reads character vector | string

reads2 — Name of file with second mate reads character vector | string | []

outputFileName — Output file name character vector | string

options — Additional options for mapping BWAMEMOptions object | character vector | string

Name-Value Arguments

AlternativeHitsThreshold — Threshold for determining which hits receive XA tag in output SAM file [5 200] (default) | nonnegative integer | two-element numeric vector

AppendReadCommentsToSAM — Flag to append FASTA or FASTQ comments to output SAM file false (default) | true

BandWidth — Maximum allowable gap length 100 (default) | nonnegative integer

BasesPerBatch — Number of bases per batch [] (default) | positive integer

ClipPenalty — Penalty for clipped alignments [5 5] (default) | nonnegative integer | two-element numeric vector

DropChainFraction — Threshold for dropping chains relative to longest overlapping chain 0.5 (default) | scalar between 0 and 1

DropChainLength — Minimum number of bases 0 (default) | nonnegative integer

ExtraCommand — Additional commands "" (default) | character vector | string

FastaHeaderToXR — Flag to include FASTA header in XR tag false (default) | true

GapExtensionPenalty — Gap extension penalty [1 1] (default) | nonnegative integer | two-element numeric vector

GapOpenPenalty — Gap opening penalty [6 6] (default) | nonnegative integer | two-element numeric vector

HeaderInsert — Text to insert into header of output SAM file [0x0 string] (default) | character vector | string

IncludeAll — Flag to apply all available options false (default) | true

InsertSizeStatistics — Insert size distribution parameters [1x0 double] (default) | four-element numeric array

MarkShortSplitsSecond — Flag to mark shorter split hits as secondary false (default) | true

MarkSmallestCoordinatePrimary — Flag to mark segment with smallest coordinates as primary false (default) | true

MatchScore — Score for sequence match 1 (default) | nonnegative integer

MaxMemOccurrence — Maximum number of MEM occurrences 500 (default) | positive integer

MaxRoundsMateRescue — Maximum number of rounds of mate rescue 50 (default) | nonnegative integer

MinSeedLength — Minimum seed length 19 (default) | positive integer

MismatchPenalty — Penalty for alignment mismatch 4 (default) | nonnegative integer

NumThreads — Number of parallel threads 1 (default) | positive integer

OutputAllAlignments — Flag to return all found alignments false (default) | true

OutputScoreThreshold — Score threshold for returning alignments 30 (default) | positive integer

ReadGroupLine — Text to insert into read group header [0x0 string] (default) | character vector | string

ReadType — Type of reads to align [0x0 string] (default) | 'pacbio | 'ont2d | 'intractg'

ReduceSupplementaryMAPQ — Flag to reduce mapping quality (MAPQ) score of supplementary alignments true (default) | false

SeedSplitRatio — Threshold for reseeding 1.50 (default) | nonnegative integer

SkipMateRescue — Flag to skip mate rescue false (default) | true

SkipPairing — Flag to skip read pairing false (default) | true

SmartPairing — Flag to perform smart pairing false (default) | true

SoftClipSupplementary — Flag to soft clip supplemental alignments false (default) | true

TreatAltAsPrimary — Flag to treat ALT contigs as part of primary assembly false (default) | true

UnpairedReadPenalty — Penalty for mapping read pairs as unpaired 17 (default) | nonnegative integer

Verbosity — Verbosity level of information printed 0 (default) | nonnegative integer

ZDropOff — Cutoff for Smith-Waterman extension 100 (default) | nonnegative integer

References

Version History

See Also

Topics

`indexBaseName` — Base name of reference index files
character vector | string

`reads1` — Name of file with first mate reads or single-end reads
character vector | string

`reads2` — Name of file with second mate reads
character vector | string | `[]`

`outputFileName` — Output file name
character vector | string

`options` — Additional options for mapping
`BWAMEMOptions` object | character vector | string

`AlternativeHitsThreshold` — Threshold for determining which hits receive XA tag in output SAM file
`[5 200]` (default) | nonnegative integer | two-element numeric vector

`AppendReadCommentsToSAM` — Flag to append FASTA or FASTQ comments to output SAM file
`false` (default) | `true`

`BandWidth` — Maximum allowable gap length
`100` (default) | nonnegative integer

`BasesPerBatch` — Number of bases per batch
`[]` (default) | positive integer

`ClipPenalty` — Penalty for clipped alignments
`[5 5]` (default) | nonnegative integer | two-element numeric vector

`DropChainFraction` — Threshold for dropping chains relative to longest overlapping chain
`0.5` (default) | scalar between `0` and `1`

`DropChainLength` — Minimum number of bases
`0` (default) | nonnegative integer

`ExtraCommand` — Additional commands
`""` (default) | character vector | string

`FastaHeaderToXR` — Flag to include FASTA header in XR tag
`false` (default) | `true`

`GapExtensionPenalty` — Gap extension penalty
`[1 1]` (default) | nonnegative integer | two-element numeric vector

`GapOpenPenalty` — Gap opening penalty
`[6 6]` (default) | nonnegative integer | two-element numeric vector

`HeaderInsert` — Text to insert into header of output SAM file
`[0x0 string]` (default) | character vector | string

`IncludeAll` — Flag to apply all available options
`false` (default) | `true`

`InsertSizeStatistics` — Insert size distribution parameters
`[1x0 double]` (default) | four-element numeric array

`MarkShortSplitsSecond` — Flag to mark shorter split hits as secondary
`false` (default) | `true`

`MarkSmallestCoordinatePrimary` — Flag to mark segment with smallest coordinates as primary
`false` (default) | `true`

`MatchScore` — Score for sequence match
`1` (default) | nonnegative integer

`MaxMemOccurrence` — Maximum number of MEM occurrences
`500` (default) | positive integer

`MaxRoundsMateRescue` — Maximum number of rounds of mate rescue
`50` (default) | nonnegative integer

`MinSeedLength` — Minimum seed length
`19` (default) | positive integer

`MismatchPenalty` — Penalty for alignment mismatch
`4` (default) | nonnegative integer

`NumThreads` — Number of parallel threads
`1` (default) | positive integer

`OutputAllAlignments` — Flag to return all found alignments
`false` (default) | `true`

`OutputScoreThreshold` — Score threshold for returning alignments
`30` (default) | positive integer

`ReadGroupLine` — Text to insert into read group header
`[0x0 string]` (default) | character vector | string

`ReadType` — Type of reads to align
`[0x0 string]` (default) | `'pacbio` | `'ont2d` | `'intractg'`

`ReduceSupplementaryMAPQ` — Flag to reduce mapping quality (MAPQ) score of supplementary alignments
`true` (default) | `false`

`SeedSplitRatio` — Threshold for reseeding
`1.50` (default) | nonnegative integer

`SkipMateRescue` — Flag to skip mate rescue
`false` (default) | `true`

`SkipPairing` — Flag to skip read pairing
`false` (default) | `true`

`SmartPairing` — Flag to perform smart pairing
`false` (default) | `true`

`SoftClipSupplementary` — Flag to soft clip supplemental alignments
`false` (default) | `true`

`TreatAltAsPrimary` — Flag to treat ALT contigs as part of primary assembly
`false` (default) | `true`

`UnpairedReadPenalty` — Penalty for mapping read pairs as unpaired
`17` (default) | nonnegative integer

`Verbosity` — Verbosity level of information printed
`0` (default) | nonnegative integer

`ZDropOff` — Cutoff for Smith-Waterman extension
`100` (default) | nonnegative integer