Main Content

blastplusdatabase

Create a local BLAST+ database

Since R2024a

Description

blastplusdatabase(databaseType,inputFile,outputDatabase) creates a local BLAST+ database [1][2] of type databaseType from the input sequences in inputFile. The created BLAST+ database has the base name outputDatabase. The function also generates multiple index files with the same base name, which are used automatically when you perform a search on this local database.

blastplusdatabase requires the BLAST+ Support Package for Bioinformatics Toolbox™. If this support package is not installed, then the function provides a download link. For details, see Bioinformatics Toolbox Software Support Packages.

example

blastplusdatabase(databaseType,inputFile,outputDatabase,databaseOpts) uses additional options specified by databaseOpts.

example

blastplusdatabase(databaseType,inputFile,outputDatabase,Name=Value) specifies additional options using one or more name-value arguments. For example, use the InputType name-value argument to specify the input file type.

example

Examples

collapse all

Download some paired-end sequencing data in the FASTA format using the accession run number SRR26273031.

databaseFasta = srafasterqdump("SRR26273031",FastaOutput=true)

Create a local nucleotide database using the downloaded FASTA file. Specify "SRR26273031_nucl_db" as the base name of the output database. When creating the database, the function also generates multiple index files with the same base name. The blastplus function uses these index files automatically when you search the database later in this example.

blastplusdatabase("nucleotide","SRR26273031.fasta","SRR26273031_nucl_db");

You can also specify additional database creation options using a MakeDatabaseOptions object. For instance, specify the title of the database.

dbopts = bioinfo.blastplus.MakeDatabaseOptions;
dbopts.Title = "SRR26273031_Nucleotide_DB"
dbopts = 
  MakeDatabaseOptions with properties:

   Default properties:
        ExtraCommand: ""
          IncludeAll: 0
           InputType: "fasta"
    ParseSequenceIDs: 0
             Version: "2.14.0"

   Modified properties:
               Title: "SRR26273031_Nucleotide_DB"

You can then use the options object to make the database.

blastplusdatabase("nucleotide","SRR26273031.fasta","SRR26273031_nucl_db",dbopts);

Alternatively, you can use specify options, such as the title of the database, by using name-value arguments. For example:

blastplusdatabase("nucleotide","SRR26273031.fasta","SRR26273031_nucl_db",Title="SRR26273031_Nucleotide_DB");

To reset the property values to their default values, use the reset function.

dopts2 = reset(dbopts)
dopts2 = 
  MakeDatabaseOptions with properties:

   Default properties:
        ExtraCommand: ""
          IncludeAll: 0
           InputType: "fasta"
    ParseSequenceIDs: 0
               Title: [1×0 string]
             Version: "2.14.0"

   Modified properties:
    No properties.

Search the database using the FASTA file queryFile.fasta containing two nucleotide query sequences. This file is provided with the toolbox. Use the blastn query program which lets you search nucleotide queries against a nucleotide database. Specify "search1" as the name of the output report file. By default, the report file format is the traditional BLAST pairwise format. This format presents each query-subject pair alignment in detail.

blastplus("blastn","queryFile.fasta","SRR26273031_nucl_db","search1");

Open the file to review the search results. The first query sequence returns no hits, while the second query sequence returns multiple hits.

open search1;

You can also modify search options by creating a corresponding options object for the blastn query program. Use blastplusoptions or bioinfo.blastplus.*Options to create the options object. For instance, change the report format to an XML format.

bnopts = blastplusoptions("blastn"); % Or use bioinfo.blastplus.BLASTNOptions
bnopts.ReportFormat = "BLASTXML";
blastplus("blastn","queryFile.fasta","SRR26273031_nucl_db","search2_xml",bnopts);
open search2_xml;

Alternatively, you can set the value of a property of the options object, such as ReportFormat, using name-value argument syntax. For example:

blastplus("blastn","queryFile.fasta","SRR26273031_nucl_db","search2_xml",ReportFormat="BLASTXML");

You can use other query programs to search the database. For instance, use tblastx to search translated nucleotide queries against a translated nucleotide database. Both query sequences return hits for this search. Use the compact tabular format for the report. For details about the generated columns and other report formats, see ReportFormat.

blastplus("tblastx","queryFile.fasta","SRR26273031_nucl_db","search3_tab",ReportFormat="Tabular");
open search3_tab;

Delete the reports and downloaded FASTA file.

delete search1 search2_xml search3_tab SRR26273031.fasta

Input Arguments

collapse all

Type of database to create, specified as "nucleotide" or "protein".

blastn, tblastn, tblastx query programs work with a nucleotide database. blastp and blastx work with a protein database.

Data Types: char | string

Name of the sequence file, specified as a character vector or string scalar. The file must be a text file with one or more sequences in the FASTA format. Specify a filename or a full file path and filename.

Ensure that the file path and filename contain no spaces.

Data Types: char | string

Base name of the output BLAST database, specified as a character vector or string scalar.

Data Types: char | string

BLAST database options, specified as a MakeDatabaseOptions object.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: blastplusdatabase("nucleotide","SRR26273031.fasta","SRR26273031_nucl_db",Title="SRR26273031_Nucleotide_DB") specifies the title of the database.

Additional commands, specified as a character vector or string scalar.

The commands must be in the native syntax (prefixed by one dash). Use this option to apply undocumented flags and flags without corresponding MATLAB® properties.

Example: "-lcase_masking"

Data Types: char | string

Flag to include all object properties with their corresponding default values when converting to the original option syntax, specified as a numeric or logical 1 (true) or 0 (false). You can convert properties to the original syntax prefixed by a dash (such as -dbtype nucl) by using the getCommand function.

When IncludeAll=false and you call getCommand(optionsObject), the software converts only the specified properties. If the value is true, getCommand converts all available properties, using default values for unspecified properties, to the original syntax.

Note

If you set IncludeAll to true, the software translates all available properties, with default values for unspecified properties. The only exception is that when the default value of a property is NaN, Inf, [], '', or "", then the software does not translate the corresponding property.

Example: true

Data Types: logical

Input file type, specified as one of the following:

  • "fasta" — FASTA format

  • "blastdb" — BLAST database format

  • "asn1_txt" — Seq-entries in the text ASN.1 format

  • "asn1_bin" — Seq-entries in the binary ASN.1 format

Data Types: char | string

Flag to parse bar-delimited sequence identifiers, such as gi|129295, in a FASTA input, specified as a numeric or logical 1 (true) or 0 (false).

When the reference file is a FASTA file and ParseSequenceIDs=true, BLAST+ extracts database identifiers from the sequence IDs in the FASTA file and saves them in the created database. These identifiers are useful to filter or limit search results, for instance, by taxonomy. For details, see BLAST Command Line Applications User Manual. When ParseSequenceIDs=false (default), BLAST+ treats each sequence ID in the file only as a unique identifier for each sequence.

Data Types: double | logical

Title for the BLAST database, specified as a character vector or string scalar.

Data Types: char | string

References

[1] Camacho, Christiam, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, and Thomas L Madden. “BLAST+: Architecture and Applications.” BMC Bioinformatics 10, no. 1 (December 2009): 421.

[2] “BLAST: Basic Local Alignment Search Tool.” https://blast.ncbi.nlm.nih.gov/Blast.cgi.

Version History

Introduced in R2024a