Unable to read a huge XML or text file

7 visualizaciones (últimos 30 días)
JFz
JFz el 24 de Jul. de 2017
Comentada: Santa Raghavan el 27 de Jul. de 2017
Hi,
I have a XML of 2GB in size. I keep getting java heap memory error when loading it. So I am thinking of reading it in as a text file and remove many useless rows in that file before saving it into a new and smaller file.
How to do that? I cannot even read it with textpad. Thanks!

Respuestas (1)

Santa Raghavan
Santa Raghavan el 26 de Jul. de 2017
Editada: Santa Raghavan el 26 de Jul. de 2017
The amount of Java Heap memory available to MATLAB can be increased and this can be done in the following way:
In the MATLAB Desktop Window:
For versions of MATLAB R2010a and above, use - File -> Preferences -> General -> Java Heap Memory. Move the slider to adjust the allocated heap memory.
For versions of MATLAB prior to R2010a, refer to the link below-
If that does not work, you can read it in as a text file using the textscan function by specifying the block size you wish to read at a time.
fileID = fopen('bigfile.txt');
formatSpec = '%s %f %*f %*f %s';
Read a block of data in the file. Use the HeaderLines name-value pair argument to instruct textscan to skip two lines before reading data.
D = textscan(fileID,formatSpec,'HeaderLines',2,'Delimiter','\t')
Refer for more info: Import large text files
  2 comentarios
JFz
JFz el 27 de Jul. de 2017
Thank! I will try it. I have increased the java heap memory to the maximum but still got the same error.
Santa Raghavan
Santa Raghavan el 27 de Jul. de 2017
You can also try the datastore function that lets you read files that dont fit into the memory.
ds = datastore('Myfile.xml', ...
'TreatAsMissing','NA')
ds.ReadSize = 100; % Specifies the number of lines
% you want to read at a time.
read(ds) % Reads first 100 lines in file
read(ds) % Reads next 100 lines in file
Subsequent read calls on ds fetches data from last read point.

Iniciar sesión para comentar.

Categorías

Más información sobre Data Import and Export en Help Center y File Exchange.

Etiquetas

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by