File Exchange

image thumbnail

xml2struct

version 1.8.0.0 (2.94 KB) by Wouter Falkena
Convert an xml file into a MATLAB structure for easy access to the data.

245 Downloads

Updated 15 May 2012

View Version History

View License

Convert an xml file into a MATLAB structure for easy access to the data.

Cite As

Wouter Falkena (2020). xml2struct (https://www.mathworks.com/matlabcentral/fileexchange/28518-xml2struct), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (171)

Daniel Stoekl

I have been using this great code for a long time, but see now that the following xml is not parsed correctly: http://www.steinheim-institut.de:80/cgi-bin/epidat?id=ad2-179-teip5
E.g. the TEI.facsimile contains two children <graphic> but only the first one gets imported.
TEI.teiHeader.profileDesc contains two childre <language> after a comment. The comment is imported but neither of the two <language> is.
In TEI.text.body.div.div none of the many <lb> children gets imported.

Julien Kollmann

Thomas Zirkle

Cleve Moler

Thanks. Very handy.
To read from URLs, add this between lines 38 and 39.
elseif contains(fiile,'http')
xDoc = xmlread(file);

Emilia Lalander

Perfect!

Chengjun Tang

Nice!

Ernst Uzhanskii

Chen Wang

Nice work!

Gabriele Mosaico

Laio Marinheiro

What's up guys! I've noticed we already have some solutions for the "out of memory" bug above stated. Well, I've managed to walk around this issue by implementing analogous code in python. This may help someone. Code follow below:

### Import libs
import xml.etree.ElementTree as ET
import numpy as np
import scipy.io as spy

### xmltools class definition
class xmltools():
'''
2020.04
@author: Laio Marinheiro
Organize xml data structure as a python dict (which is annalogous of a matlab
struct). This is inspired in xml2struct.m in matlab.
This also uses
'''

def LoadXML(self, xml_filename):

xml_tree = ET.parse(xml_filename)
xml_root = xml_tree.getroot()
return xml_root

def ChildProp(self,parent):

child_name = [] # name of each child
unique_child_names = [] # unique name of children
child_len = {} # length of each child (number of subchildren of each child)
child_idx = {} # index of the child on parent
k = 0
for child in parent:
child_name.append(child.tag)
child_len[child.tag] = len(child) # irrelevant =/
child_idx[child.tag] = k
if child.tag not in unique_child_names:
unique_child_names.append(child.tag)
k = k + 1
for unique_name in unique_child_names:
k = 1
for name in child_name:
if name == unique_name:
child_len[unique_name] = k
k = k + 1
ChildProp = {}
ChildProp['names'] = child_name
ChildProp['unique_names'] = unique_child_names
ChildProp['len'] = child_len
ChildProp['index'] = child_idx
return ChildProp

def GetDict(self,parent):
Dict = {}
if len(parent) == 0: # parent is the last field
if parent.text is not None:
Dict['Text'] = parent.text
else:
Dict['Text'] = '' # empty string
else: # if parent is not the last, it has children
childprop = toolbox.ChildProp(parent)
if len(parent) == len(childprop['index']): # children are struct fields
for child in parent:
Dict[child.tag] = self.GetDict(child)
else: # children are elements of list (or lists)
unique_names = list(childprop['index'].keys())
for list_name in unique_names: # for each list
Dict[list_name] = np.zeros((childprop['len'][list_name],), dtype = np.object)
k = 0
for child in parent:
if child.tag == list_name:
Dict[list_name][k] = self.GetDict(child)
k = k + 1
if parent.attrib != {}:
Dict['Attributes'] = parent.attrib
return Dict

def Dict2Mat(self,xml_filename,dictionary):
spy.savemat(xml_filename[0:-4]+'.mat',mdict={'pythonxml': dictionary})

def xml2struct(self,xml_filename):
xml_root = self.LoadXML(xml_filename)
xml_dict = {xml_root.tag:{}}
xml_dict[xml_root.tag] = self.GetDict(xml_root)
self.Dict2Mat(xml_filename,xml_dict)

### Example
toolbox = xmltools()
xml_filename_str = 'your_filename.xml'
toolbox.xml2struct(xml_filename_str )

Nicholas Battaglia

Marcin Konowalczyk

Ernst Uzhanskii

Hyunjin Paek

Anne-Laure Guinet

Thanks!

Aryaa R

Thi Ho

svanimisetti

svanimisetti

cem polat

Emmanouil

Emmanouil

Valeriy

Radu Oprea

Michael Klukinov

Nicolas

Chase Nelson

Constantin Siriteanu

Paul Witsberger

Asbjørn Berge

Urvashi Pal

Furkan Küçük

Better than mathworks functions!

Hajo Kleingeld

It works, and is easy to use!

gareth lloyd

Worked straight out of the box.

Yii-Lih Lin

thank you!!!

Rostislav Teryaev

great work! Thank you

Bo Deng

Farhod Mahmudkhojaev

Sowmiya Raksha Naik

where can I find output file of xml2struct

Gianluca Acunzo

Yang Jiao

powerful and accurate

Kiarash Ahi

I wonder if there is any python equivalent of this script?

Kerem Pekergin

Raj Tailor

Laurens Bliek

Lianting Hu

well done!

tanfeng

Thanks a lot!

XIN TANG

Stefan Smit

Great stuff! Thanks a lot!

nealm

Noelia Suarez

This function works perfectly! Thank you very very much!

vc

Great work! Thanks

Carlos Castedo

Works perfect for me, this sould be part of MATLAB

adam zhang

Mauro

should be part of MATLAB

Simon Seitz

Meade

Works beautifully and is robust.
Nice work and Thanks!

Joerg Buchholz

A King

Nirav Ambaliya

Super. Great work.

Marc Timmer

Nice! Exactly what I was looking for!

Jeffrey Kern

Nice function. Thank you!

Marian Kersting

Maksym Sich

William R

Nice submission!

My advice would be to remove line 137, 138 and 139 with:
name = matlab.lang.makeValidName(name);

This matlab-function will make sure all not valid Matlab names are replaced with an underscore.

Konst Apost

Marsa Taheri

Matthias Zunhammer

Nice, should be included in MATLAB by default.

Tian Zhang

Anand Saran

Dhiraj Bhandary

Bridget Tannian

Rakesh Jasti

zheng qiu

Laio Marinheiro

Jeremy Benichou

Philipp Glira

NenaV

Seems to be working well - just wondering how to save the structure back to an XML again? I couldn't find this mentioned in the comments.

Rich Sykes

Sven Martin

very useful for modifying xml files

M.Abuasbeh

Paulo Fonte

Julian Hapke

really useful script, but rather slow on large xmls, the xmlread only used 1/10th of the overall time.
My improvement idea:
change
if (~isempty(regexprep(text.(textflag),'[\s]*','')))
to
if ~all(isspace(text.(textflag)))
and get a overall speedup of factor 2 (in my test case at least)

Alexey R.

Thank you so so much!

Eike Ullrich

thanks a lot, great work

Tianlong ma

thank you

Matthew Murphy

I had to make a few modifications to get my XML file to work. I will put them below, but as this is my first time using this file type, mileage may vary.

Line 95 in version current as of 3/7/2017

children.(name) = text;

That overwrote all of the child nodes that had data stored in them, given that the last node to be parsed was a comment (i.e. it only contained a string). Other nodes contained numerical values held as a string value. Here was my fix:

if isfield(text,'Text')
children.(name) = str2num(text.Text);
else
children.(name).('Comment') = text.Comment;
end

Overall very helpful, and was exactly what I needed after I put the replacement lines in.

Thanks!

Mohamed Mousa

Yifei Wang

That's Great!!! Thank you so much!!!

Hasenearl

ChrisDz

This is a very usefull script! I used it on reading AUTOSAR XML - Files! On reading AR-XML files I found two challanges:
1. The long replacemement texts for {'-'|':'|'.'} within xml-tags leads to the problem, that matlab fieldnames will become longer than 63 chars! I reduced them to {'_'|'c'|'d'}! That helped!
2. In case of XML Comments <!-- Comment --> to be used inside the XML-file the script Fails! I fixed this issue! Have a look at it!

Replace: Inside the function: parseChildNodes(...)
% CDz 2016-12-21 Commented Out Due to problems with
% XML-Comments
% if(~isempty(fieldnames(text)))
% children.(name){index} = text;
% end

% CDz 2016-12-21 Added to Handle XML - Comments
if(~isempty(text) && isstruct(text))
if find(strcmp(fieldnames(text),'Text'))
children.(name){index}.('Text') = text.Text;
elseif find(strcmp(fieldnames(text),'Comment'))
children.(name){index}.('Comment') = text.Comment;
end
end

and

% CDz 2016-12-21 Commented out due to problems with
% XML-Comments
% if(~isempty(text) && ~isempty(fieldnames(text)))
% children.(name) = text;
% end

% CDz 2016-12-21 Added to Handle XML - Comments
if(~isempty(text) && isstruct(text))
if find(strcmp(fieldnames(text),'Text'))
children.(name).('Text') = text.Text;
elseif find(strcmp(fieldnames(text),'Comment'))
children.(name).('Comment') = text.Comment;
end
end

@ Wouter Falkena: If you are interestd I can provide you a full copy of this file that you can update this script

Rody Oldenhuis

Well done

Hannes Mogensen

Simple and very useful. Very convenient.

Luma AL-HARBAWEE

I am a Phd student I need to apply this function on my code to get attribute of XML file

Mary Ann Harrison

Mike Wehr

Works Great! Tried xml_toolbox but it's broken since 2014. This is a solid replacement.

Xiomara Herrera

please I need to edit a xml file

Anas Imran

CY Y

I have looked through the issues, implemented fixes, added some new feature to the script, and uploaded it here : https://www.mathworks.com/matlabcentral/fileexchange/58700-xml2struct . Please also try my updated version and let me know if it works better now.

Anael

Neil's fix doesn't do the trick for me...

Anael

Doesn't work right out of the box. Stéphane's fix works great!
Also same issue as Sebastien regarding comments and headers.

Keith Hooks

I received a "java.lang.OutOfMemoryError: GC overhead limit exceeded" when trying to open a Kanji dictionary file - http://www.edrdg.org/kanjidic/kanjidic2.xml.gz

Anael

Dominik Roszkowski

Good, but rather slow for large xml files.

Karan Gill

Julian Cieplik

yoav

Daniel

Can you update the code? I am having the same problem.

RONAK KOSTI

Richard Uhlmann

Thanks for this very flexible script! TOP!

zepp

this is exactly what I was looking for.

Daniel

Thanks Wouter,

It's not exactly but appreciate your response. The probem is solved now. I may upload the code if someone have the same problem.

Stéphane

Hi,

Found a bug : When there is text and children in the same node, the text overwrites the children.

Fix:
Replace

if(~isempty(fieldnames(text)))
children.(name){index} = text;
end

by:

if isstruct(text)
for fld=fieldnames(text)'
children.(name){index}.(fld{1}) = text.(fld{1});
end
end

And replace also:

if(~isempty(text) && ~isempty(fieldnames(text)))
children.(name) = text;
end

by:

if isstruct(text)
for fld=fieldnames(text)'
children.(name).(fld{1}) = text.(fld{1});
end
end

Thanks.

Armin Ghasem Azar

superaga

Prabakaran R

Simple to use and works great !! Thanks for sharing the work.

moi

Ilya Belevich

Uri Cohen

A

it takes a long time to run for larger XMLs. Is there anywhere in the code I can a waitbar to at least report progress to user? I tried the two for loops but that does not seem to be the bottleneck.

Benjamin Falk

Joerg

Kevin

Very simple to use and it works.

Alex

"Andrew Wilson: The fix from Neill Weiss in an earlier comment/review seems to solve this, so it would be great to see that incorporated into an update!" thanks Andrew Wilson

Andrew Wilson

Works great for the most part, but the issue of nodes being lost when comments are present at the same level of the hierarchy is quite frustrating. The fix from Neill Weiss in an earlier comment/review seems to solve this, so it would be great to see that incorporated into an update!

Adam Wyatt

Seems to work fine except as reported by Sebastien Roy on 09/10/14 - xml comments don't work (resulting in a loss of the other data)

William Murphy

Downloaded this file this evening to process some XML data. worked just fine.

Bernhard

Sorry, pasted the wrong line.
Here is line 154 that fixes the problem for me:

text.(textflag) = char(getTextContent(theNode));

Bernhard

Great stuff.

Regarding that "Undefined function 'toCharArray' for input arguments of type 'double'." Error:

For me it worked to change line 154 into
text.(textflag) = char(getData(theNode))';
as it has been in an earlier version of xml2struct (mentioned in the comments in the code in line 153)

Chris FUNG

Sebastien Roy

Great time saver when compared to using xmlread directly. However, there is a bug with child nodes when a text is present. The child node content will be set to the text and all other content of the child will be lost. A comment, being processed as text, will cause the same issue. Attempting to read this xml will not provide the expected result:

<?xml version="1.0" encoding="UTF-8"?>

<root>
<!-- Should be a benign comment -->
<mystuff>Valuable data</mystuff>
</root>

Simon du Plooy

Some of the attributes in the XML file had underscores at the beginning which error because of disallowed field name. Simple strrep solved the problem.

Great!

Excellent

Fredrik

Rody Oldenhuis

Timo Dörsam

simbaforrest

Anders Bergåker

Mark Mikofski

Stop using XML and use json.org/java [1] static XML.toJSONObject() method [2], there's a precompiled jar file in my dropbox [3] or use Newton King's JSON.NET [4] which is already precompiled by him and available from codeplex [5] just download and unzip then use the version for the .NET framework on your machine. Converting between XML and JSON is described in the documentation [6] and in this SO post [7]. See MATLAB documentation for more information on using Java [8] or .NET [9] in MATLAB. It's super easy!
[1] http://json.org/java/
[2](http://json.org/javadoc/org/json/XML.html#toJSONObject(java.lang.String))
[3] https://dl.dropboxusercontent.com/u/19049582/JSON.jar
[4] http://james.newtonking.com/pages/json-net.aspx
[5] https://json.codeplex.com/
[6] http://james.newtonking.com/projects/json/help/index.html?topic=html/ConvertingJSONandXML.htm
[7] http://stackoverflow.com/a/814027/1020470
[8] http://www.mathworks.com/help/matlab/using-java-libraries-in-matlab.html
[9] http://www.mathworks.com/help/matlab/using-net-libraries-in-matlab.html

Fábio Nery

I've seen some other users report this issue but could not find how to fix this:

Undefined function 'toCharArray' for input arguments of type 'double'.

Any idea?

Regards

Varoujan

Works well.
Didn't fully test for empty field cases like some commenters but I got a nice structure out of my input file.

I am disappointed that a similar functionality isn't built in Matlab. xmlread and xmlwrite alone are such a pain to access and/or update xml data.

Adam

Hi,
Thanks for the file, it works great.
But I have also the same problem as Erik with empty data fields. Someone know how to fix this?

Yu

Faster than xml_read, recommended!

Erik

Thanks for the file, however I'm having an issue with empty data fields.

If I have a 100x50 XML data set which I can easily import into Excel. However there are a few fields which are empty. For example at (5,35:40), the XML data is empty.

When I use the xml2struct and then try and create a cell array in the same format (100x50) the data in row 5 between 40:50, shifts to the 35:45 position and I'm left with 5 empty spaces from 45:50 and as such the data is misaligned.

Any idea on how to deal with empty fields in order to maintain their position in the original file?

Thanks!

Rosie Vakasilimi

i was just wondering if someone could just confirm what i am doing is correct. when i want to convert xml into a matlab array, i type:
data=xml2struct('name of the file i want to convert'); ? is that all?

Michael Pelz-Sherman

We are encountering the same issue reported by Raoul Herzog: Undefined function or method 'toCharArray' for input arguments of type 'double'. Is there a fix for this?

Neill Weiss

For the comment bug, @Sirius3, I changed the following code block from:

if (~strcmp(name,'#text') && ~strcmp(name,'#comment') && ~strcmp(name,'#cdata_dash_section'))
%XML allows the same elements to be defined multiple times,
%put each in a different cell
if (isfield(children,name))
if (~iscell(children.(name)))
%put existsing element into cell format
children.(name) = {children.(name)};
end
index = length(children.(name))+1;
%add new element
children.(name){index} = childs;
if(~isempty(fieldnames(text)))
children.(name){index} = text;
end
if(~isempty(attr))
children.(name){index}.('Attributes') = attr;
end
else
%add previously unknown (new) element to the structure
children.(name) = childs;
if(~isempty(text) && ~isempty(fieldnames(text)))
children.(name) = text;
end
if(~isempty(attr))
children.(name).('Attributes') = attr;
end
end
else

to

if (~strcmp(name,'#text') && ~strcmp(name,'#comment') && ~strcmp(name,'#cdata_dash_section'))
%XML allows the same elements to be defined multiple times,
%put each in a different cell
if (isfield(children,name))
if (~iscell(children.(name)))
%put existsing element into cell format
children.(name) = {children.(name)};
end
index = length(children.(name))+1;
%add new element
children.(name){index} = childs;
textFieldNames = fieldnames(text);
for t = 1:length(textFieldNames)
textFieldName = textFieldNames{t};
children.(name){index}.(textFieldName) = text.(textFieldName);
end
if(~isempty(attr))
children.(name){index}.('Attributes') = attr;
end
else
%add previously unknown (new) element to the structure
children.(name) = childs;
if(~isempty(text) && ~isempty(fieldnames(text)))
textFieldNames = fieldnames(text);
numTextFieldNames = length( textFieldNames );
for i = 1:numTextFieldNames
thisFieldName = textFieldNames{i};
children.(name).(thisFieldName) = text.(thisFieldName);
end
end
if(~isempty(attr))
children.(name).('Attributes') = attr;
end
end
else

Now, the children.(name) properties are not blown away when a comment is parsed.

Sirius3

bug: child nodes get lost, when there are comments between them. (line 95)

Gledi

First of all thank for the excellent code.
I have a "small" problem according to the cell. In you code, if there are more MORE THAN ONE child than you create a cell, otherwise not. What should I change to have the case: Even if the node has ONLY ONE child than I create a cell (with one element)

Matthew

Worked very well for me. Thank you so much.

Raoul Herzog

There seems to be a bug in xml2struct :
I can provide you the corresponding xml file if needed.

??? Undefined function or method 'toCharArray' for input arguments of type 'double'.

Error in ==> xml2struct>parseAttributes at 174
str = toCharArray(toString(item(theAttributes,count-1)))';

Error in ==> xml2struct>getNodeData at 141
attr = parseAttributes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
[text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
[childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
[text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
[childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
[text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
[childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
[text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
[childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
[text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
[childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
[text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
[childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
[text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
[childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
[text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
[childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
[text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
[childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
[text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct>getNodeData at 147
[childs,text,textflag] = parseChildNodes(theNode);

Error in ==> xml2struct>parseChildNodes at 72
[text,name,attr,childs,textflag] = getNodeData(theChild);

Error in ==> xml2struct at 57
s = parseChildNodes(xDoc);

Xiaohu

Ivan Smirnov

One of the problems that I personally encountered is that xml2struct can't handle CDATA blocks.

It can be easily fixed, replace line 67 with:
if (~strcmp(name,'#text') && ~strcmp(name,'#comment') && ~strcmp(name,'#cdata_dash_section'))
and line 94 with:
elseif (strcmp(name,'#text') || strcmp(name, '#cdata_dash_section'))

Works great otherwise, thanks.

ali

Excellent! I was pulling my hair to read to numbers from XML file and with this I did it in one minute

Kevin Moerman

Works great for small files. I tested it for some larger files with >100000 entries and this takes around 178 seconds.

Kevin Moerman

Brad

Wouter Falkena

Thank you for this suggestion Mr. Wanner. I have updated the file and it is currently under review by the MATLAB Central. It will appear here shortly.

Adrian Wanner

Thanks for your work.
You might want to speed up the attribute parsing by about 40% by replacing lines 152-154 by the following:
str=theAttributes.item(count-1).toString.toCharArray()';
k=strfind(str,'=');
attr_name = regexprep(str(1:(k(1)-1)),'[-:.]','_');
attributes.(attr_name) =str((k(1)+2):(end-1));

Mark

Thanks, your auto field naming system worked great for me to work with data parsed out from XML files.

Bernard

Thanks a lot! I finally came across a tool that can extract info from a ISO19115/19139 xml file.

Joao Henriques

Simple and works pretty well! The structures are a bit verbose but they're supposed to be parsed by my program anyway; any attempts to collapse some of the nested structures would only slow down the code (some similar submissions do this but are much slower). Thanks!

Krishnan Suresh

Thanks v. much! I used it to read a Collada file (geometry file Google Sketch-up). Worked like a charm!

Wouter Falkena

You are correct. I have removed the '.xml' extension assumption, unless the file can not be found. The update file is currently under review by MATLAB Central and should appear here soon.

Mathieu

Warning: all XML files haven't '.xml' extension

Joanne

Worked on the first try for loading an OSM data file.

TideMan

I was tearing my hair out trying to figure out how to automatically access one tiny piece of data in a .xml file until I found this routine.

Yanai

MATLAB Release Compatibility
Created with R2009b
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!