Main Content

XMLImportOptions

Import options object for XML Files

Description

An XMLImportOptions object enables you to specify how MATLAB® imports structured, tabular data from XML files. The object contains properties that control the data import process, including the handling of errors and missing data.

Creation

You can create an XMLImportOptions object using either the xmlImportOptions function (described here) or the detectImportOptions function:

  • Use xmlImportOptions to define the import properties based on your import requirements.

  • Use detectImportOptions to detect and populate the import properties based on the contents of the XML file specified in filename.

    opts = detectImportOptions(filename)

Description

opts = xmlImportOptions creates an XMLImportOptions object with one variable.

opts = xmlImportOptions('NumVariables',numVars) creates the object with the number of variables specified in numVars.

example

opts = xmlImportOptions(___,Name,Value) specifies additional properties for an XMLmportOptions object using one or more name-value arguments.

Input Arguments

expand all

Number of variables, specified as a positive scalar integer.

Properties

expand all

Variable Properties

Variable names, specified as a cell array of character vectors or string array. The VariableNames property contains the names to use when importing variables.

If the data contains N variables, but no variable names are specified, then the VariableNames property contains {'Var1','Var2',...,'VarN'}.

To support invalid MATLAB identifiers as variable names, such as variable names containing spaces and non-ASCII characters, set the value of VariableNamingRule to 'preserve'.

Example: opts.VariableNames returns the current (detected) variable names.

Example: opts.VariableNames(3) = {'Height'} changes the name of the third variable to Height.

Data Types: char | string | cell

Flag to preserve variable names, specified as the comma-separated pair consisting of VariableNamingRule and either 'modify' or 'preserve'.

  • 'modify' — Convert invalid variable names (as determined by the isvarname function) to valid MATLAB identifiers.

  • 'preserve' — Preserve variable names that are not valid MATLAB identifiers such as variable names that include spaces and non-ASCII characters.

Starting in R2019b, variable names and row names can include any characters, including spaces and non-ASCII characters. Also, they can start with any characters, not just letters. Variable and row names do not have to be valid MATLAB identifiers (as determined by the isvarname function). To preserve these variable names and row names, set the value of VariableNamingRule to 'preserve'.

Data Types: char | string

Data type of variable, specified as a cell array of character vectors, or string array containing a set of valid data type names. The VariableTypes property designates the data types to use when importing variables.

To update the VariableTypes property, use the setvartype function.

Example: opts.VariableTypes returns the current variable data types.

Example: opts = setvartype(opts,'Height',{'double'}) changes the data type of the variable Height to double.

Subset of variables to import, specified as a character vector, string scalar, cell array of character vectors, string array or an array of numeric indices.

SelectedVariableNames must be a subset of names contained in the VariableNames property. By default, SelectedVariableNames contains all the variable names from the VariableNames property, which means that all variables are imported.

Use the SelectedVariableNames property to import only the variables of interest. Specify a subset of variables using the SelectedVariableNames property and use readtable to import only that subset.

To support invalid MATLAB identifiers as variable names, such as variable names containing spaces and non-ASCII characters, set the value of VariableNamingRule to 'preserve'.

Example: opts.SelectedVariableNames = {'Height','LastName'} selects only two variables, Height and LastName, for the import operation.

Example: opts.SelectedVariableNames = [1 5] selects only two variables, the first variable and the fifth variable, for the import operation.

Example: T = readtable(filename,opts) returns a table containing only the variables specified in the SelectedVariableNames property of the opts object.

Data Types: uint16 | uint32 | uint64 | char | string | cell

Type specific variable import options, returned as an array of variable import options objects. The array contains an object corresponding to each variable specified in the VariableNames property. Each object in the array contains properties that support the importing of data with a specific data type.

Variable options support these data types: numeric, text, logical, datetime, or categorical.

To query the current (or detected) options for a variable, use the getvaropts function.

To set and customize options for a variable, use the setvaropts function.

Example: opts.VariableOptions returns a collection of VariableImportOptions objects, one corresponding to each variable in the data.

Example: getvaropts(opts,'Height') returns the VariableImportOptions object for the Height variable.

Example: opts = setvaropts(opts,'Height','FillValue',0) sets the FillValue property for the variable Height to 0.

Variable descriptions XPath expression, specified as a character vector or string scalar that the reading function reads uses to select the table variable descriptions. You must specify VariableDescriptionsSelector as a valid XPath version 1.0 expression.

Example: 'VariableDescriptionsSelector','/RootNode/ChildNode'

Table variable XPath expressions, specified as a cell array of character vectors or string array that the reading function uses to select table variables. You must specify VariableSelectors as valid XPath version 1.0 expressions.

Example: 'VariableSelectors',{'/RootNode/ChildNode'}

Example: 'VariableSelectors',"/RootNode/ChildNode"

Example: 'VariableSelectors',["/RootNode/ChildNode1","/RootNode/ChildNode2"]

Variable units XPath, specified as a character vector or string scalar that the reading function uses to select the table variable units. You must specify VariableUnitsSelector as a valid XPath version 1.0 expression.

Example: 'VariableUnitsSelector','/RootNode/ChildNode'

Table Properties

Table row names XPath expression, specified as a character vector or string scalar that the reading function uses to select the names of the table rows. You must specify RowNamesSelector as a valid XPath version 1.0 expression.

Example: 'RowNamesSelector','/RootNode/ChildNode'

Table row XPath expression, specified as a character vector or string scalar that the reading function uses to select individual rows of the output table. You must specify RowSelector as a valid XPath version 1.0 expression.

Example: 'RowSelector','/RootNode/ChildNode'

Table data XPath expression, specified as a character vector or string scalar that the reading function uses to select the output table data. You must specify TableSelector as a valid XPath version 1.0 expression.

Example: 'TableSelector','/RootNode/ChildNode'

Set of registered XML namespace prefixes, specified as the comma-separated pair consisting of RegisteredNamespaces and an array of prefixes. The reading function uses these prefixes when evaluating XPath expressions on an XML file. Specify the namespace prefixes and their associated URLs as an Nx2 string array. RegisteredNamespaces can be used when you also evaluate an XPath expression specified by a selector name-value argument, such as StructSelector for readstruct, or VariableSelectors for readtable and readtimetable.

By default, the reading function automatically detects namespace prefixes to register for use in XPath evaluation, but you can also register new namespace prefixes using the RegisteredNamespaces name-value argument. You might register a new namespace prefix when an XML node has a namespace URL, but no declared namespace prefix in the XML file.

For example, evaluate an XPath expression on an XML file called example.xml that does not contain a namespace prefix. Specify 'RegisteredNamespaces' as [“myprefix”, “https://www.mathworks.com”] to assign the prefix myprefix to the URL https://www.mathworks.com.

T = readtable("example.xml", "VariableSelector", "/myprefix:Data",...
 "RegisteredNamespaces", [“myprefix”, “https://www.mathworks.com”])

Example: 'RegisteredNamespaces',[“myprefix”, “https://www.mathworks.com”]

Replacement Rules

Procedure to manage missing data, specified as one of the values in this table.

Missing RuleBehavior
'fill'

Replace missing data with the contents of the FillValue property.

The FillValue property is specified in the VariableImportOptions object of the variable being imported. For more information on accessing the FillValue property, see getvaropts.

'error'Stop importing and display an error message showing the missing record and field.
'omitrow'Omit rows that contain missing data.
'omitvar'Omit variables that contain missing data.

Example: opts.MissingRule = 'omitrow';

Data Types: char | string

Procedure to handle import errors, specified as one of the values in this table.

Import Error RuleBehavior
'fill'

Replace the data where the error occurred with the contents of the FillValue property.

The FillValue property is specified in the VariableImportOptions object of the variable being imported. For more information on accessing the FillValue property, see getvaropts.

'error'Stop importing and display an error message showing the error-causing record and field.
'omitrow'Omit rows where errors occur.
'omitvar'Omit variables where errors occur.

Example: opts.ImportErrorRule = 'omitvar';

Data Types: char | string

Procedure to handle repeated XML nodes in a given row of a table, specified as 'addcol', 'ignore', or 'error'.

Repeated Node Rule

Behavior

'addcol'

Add columns for the repeated nodes under the variable header in the table. Specifying the value of 'RepeatedNodeRule' as 'addcol' does not create a separate variable in the table for the repeated node.

'ignore'

Skip importing the repeated nodes.

'error'Display an error message and abort the import operation.

Example: 'RepeatedNodeRule','ignore'

Examples

collapse all

Create XML import options for an XML file, specify the variables to import, and then read the data.

The XML file students.xml has four sibling nodes named Student, which each contain the same child nodes and attributes.

type students.xml
<?xml version="1.0" encoding="utf-8"?>
<Students>
    <Student ID="S11305">
        <Name FirstName="Priya" LastName="Thompson" />
        <Age>18</Age>
        <Year>Freshman</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">591 Spring Lane</Street>
            <City>Natick</City>
            <State>MA</State>
      </Address>
      <Major>Computer Science</Major>
      <Minor>English Literature</Minor>
   </Student>
   <Student ID="S23451">
        <Name FirstName="Conor" LastName="Cole" />
        <Age>18</Age>
        <Year>Freshman</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">4641 Pearl Street</Street>
            <City>San Francisco</City>
            <State>CA</State>
        </Address>
        <Major>Microbiology</Major>
        <Minor>Public Health</Minor>
    </Student>
    <Student ID="S119323">
        <Name FirstName="Morgan" LastName="Yang" />
        <Age>21</Age>
        <Year>Senior</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">30 Highland Road</Street>
            <City>Detriot</City>
            <State>MI</State>
        </Address>
        <Major>Political Science</Major>
   </Student>
   <Student ID="S201351">
        <Name FirstName="Salim" LastName="Copeland" />
        <Age>19</Age>
        <Year>Sophomore</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">3388 Moore Avenue</Street>
            <City>Fort Worth</City>
            <State>TX</State>
        </Address>
        <Major>Business</Major>
        <Minor>Japanese Language</Minor>
   </Student>
   <Student ID="S201351">
        <Name FirstName="Salim" LastName="Copeland" />
        <Age>20</Age>
        <Year>Sophomore</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">3388 Moore Avenue</Street>
            <City>Fort Worth</City>
            <State>TX</State>
        </Address>
        <Major>Business</Major>
        <Minor>Japanese Language</Minor>
    </Student>
    <Student ID="54600">
        <Name FirstName="Dania" LastName="Burt" />
        <Age>22</Age>
        <Year>Senior</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">22 Angie Drive</Street>
            <City>Los Angeles</City>
            <State>CA</State>
        </Address>
        <Major>Mechanical Engineering</Major>
        <Minor>Architecture</Minor>
   </Student>
    <Student ID="453197">
        <Name FirstName="Rikki" LastName="Gunn" />
        <Age>21</Age>
        <Year>Junior</Year>
        <Address>
            <Street xmlns="https://www.mathworks.com">65 Decatur Lane</Street>
            <City>Trenton</City>
            <State>ME</State>
        </Address>
        <Major>Economics</Major>
        <Minor>Art History</Minor>
   </Student>
</Students>

Create an XMLImportOptions object from a file. Specify the value of VariableSelectors as //@FirstName to select the FirstName element node to import as a table variable.

opts = xmlImportOptions("VariableSelectors","//@FirstName")
opts = 
  XMLImportOptions with properties:

   Replacement Properties:
                     MissingRule: "fill"
                 ImportErrorRule: "fill"
                RepeatedNodeRule: "addcol"

   Variable Import Properties: Set types by name using setvartype
                   VariableNames: "Var1"
                   VariableTypes: "char"
           SelectedVariableNames: "Var1"
                 VariableOptions: Show all 1 VariableOptions 
	Access VariableOptions sub-properties using setvaropts/getvaropts
              VariableNamingRule: "preserve"

   Location Properties:
                   TableSelector: <missing>
                     RowSelector: <missing>
               VariableSelectors: "//@FirstName"
           VariableUnitsSelector: <missing>
    VariableDescriptionsSelector: <missing>
                RowNamesSelector: <missing>
            RegisteredNamespaces: [0x2 string]

Use readtable along with the options object to import the specified variable.

T = readtable("students.xml",opts)
T=7×1 table
       Var1   
    __________

    {'Priya' }
    {'Conor' }
    {'Morgan'}
    {'Salim' }
    {'Salim' }
    {'Dania' }
    {'Rikki' }

Register a custom XML namespace prefix to the existing namespace URL in the input file using the RegisteredNamespaces name-value argument.

Create an XMLImportOptions object from an XML file. Specify the XPath expression of the Street element node as the value of 'VariableSelectors', and register the prefix myPrefix to the URL belonging to the Street node.

opts = detectImportOptions("students.xml","RegisteredNamespaces", ["myPrefix","https://www.mathworks.com"],...
    "VariableSelectors","//myPrefix:Street")
opts = 
  XMLImportOptions with properties:

   Replacement Properties:
                     MissingRule: "fill"
                 ImportErrorRule: "fill"
                RepeatedNodeRule: "addcol"

   Variable Import Properties: Set types by name using setvartype
                   VariableNames: "Street"
                   VariableTypes: "string"
           SelectedVariableNames: "Street"
                 VariableOptions: Show all 1 VariableOptions 
	Access VariableOptions sub-properties using setvaropts/getvaropts
              VariableNamingRule: "preserve"

   Location Properties:
                   TableSelector: <missing>
                     RowSelector: <missing>
               VariableSelectors: "//myPrefix:Street"
           VariableUnitsSelector: <missing>
    VariableDescriptionsSelector: <missing>
                RowNamesSelector: <missing>
            RegisteredNamespaces: ["myPrefix"    "https://www.mathworks.com"]

Use the readtable function along with the options object to import the selected variable.

T2 = readtable("students.xml",opts)
T2=7×1 table
          Street       
    ___________________

    "591 Spring Lane"  
    "4641 Pearl Street"
    "30 Highland Road" 
    "3388 Moore Avenue"
    "3388 Moore Avenue"
    "22 Angie Drive"   
    "65 Decatur Lane"  

Tips

  • The following XPath syntaxes are supported for XPath selector name-value arguments, such as RowSelector or VariableSelector.

    • To select every node whose name matches the node you want to select, regardless of its location in the document, use the "//myNode" syntax. You can use "//myNode" to omit the XPath expression that precedes the node that you want to select.

    • To read one of several sibling nodes under one parent node in the file, you can specify ChildNode[n], where n corresponds to the sibling node that you want to index. For example, the path "/RootNode/ChildNode[2]" selects the second ChildNode element whose parent is RootNode.

    • To read the value of an attribute belonging to an element node in the input XML file, specify @ before the name of the attribute. For example, "/RootNode/ChildNode[2]/@AttributeName" selects the attribute AttributeName belonging to the second ChildNode element whose parent is RootNode.

Introduced in R2021a