Splitting Cells Contents into A Uniform Dimension

1 visualización (últimos 30 días)
Quang Phung
Quang Phung el 10 de Oct. de 2018
Editada: dpb el 11 de Oct. de 2018
Hi,
I've been having some issues getting uniform dimensions when I'm splitting my cells.
The code I'm using to split my cell manages to separate my data quite well. The problem is that the data I need to import isn't really consistent with its contents, so some rows end up having blank values and aren't factored into the columnization. i.e. some dimensions in my cell end up being 4x1 or 8x1.
Code Attached Here:
% READ FILE
PROPERTY = textscan(fopen('PROPERTY.txt'),'%s','Delimiter','\n');
% CLOSE FILE
fclose(fopen('PROPERTY.txt'));
% DATA EXTRACTION
PROPERTY = PROPERTY{1,1};
% CELL COLUMNIZATION
PROPERTY = cellfun(@(cIn)strsplit(cIn,' ')',PROPERTY,'UniformOutput',false);
PROPERTY = [PROPERTY {:}]';
This leads to the following output (Attachment 1), as you can see the dimension of the columnized cells end up not being consistent. I'd like to know to create a uniform output in my cell that is 9x1. Replacing any blank data with a 0 for its associated column.
I have also attached the file that I need to extract my data from as well (PROPERTY.txt).
Thanks in advance
EDIT: I should also add, I'm only really concerned about the data in the rows that display atomic data.
  3 comentarios
Quang Phung
Quang Phung el 10 de Oct. de 2018
I believe you may be right.
The input text is based off code derived from Fortran punch cards. I believe this attachment details the space allocated for each variable.
dpb
dpb el 10 de Oct. de 2018
AHA! I suspected as much. IF you will be using only file that are consistent with that format, then one of the text import objects should "work more better".

Iniciar sesión para comentar.

Respuesta aceptada

Guillaume
Guillaume el 10 de Oct. de 2018
Use a FixedWidthImportOptions object to specify the format of your file and import as a table:
opts = matlab.io.text.FixedWidthImportOptions;
opts.DataLines = [2 Inf];
opts.VariableNames = {'NUCLIDE', 'XHL', 'AHL', 'SIGCTH', 'COEFF', 'RIC', 'SIGFTH', 'COEFF', 'RIF'};
opts.VariableWidths = [10 9 1 10 10 10 10 10 10];
opts = setvartype(opts, opts.VariableNames, {'char', 'double', 'char', 'double', 'double', 'double', 'double', 'double', 'double'});
properties = readtable('PROPERTY.txt', opts')

Más respuestas (1)

dpb
dpb el 10 de Oct. de 2018
opts = matlab.io.text.FixedWidthImportOptions; % create a basic import object to fill in
opts.DataLine=2;
opts.VariableNames={'Nuclide','HalfLife','Units','SigCaptTh','CoeffC','RiC','SigFissTh','CoeffF','RiF'};
opts=setvartype(opts,{'char','double','char','double','double','double','double','double','double'});
opts.VariableWidths=[10,9,1,10*ones(1,6)]; % just pick first to empty columns in nuclide
opts=setvaropts(opts,[4:9],'FillValue',0);
T=readtable('PROPERTY.txt',opts);
resulted in
>> T(1:10,:)
ans =
10×9 table
Nuclide HalfLife Units SigCaptTh CoeffC RiC SigFissTh CoeffF RiF
________ ________ _____ _________ ______ ____ _________ ______ ___
'RA-223' 11.435 'D' 130 0 0 0.7 0 0
'RA-226' NaN 'Y' 12.8 1.5 280 0 0 0
'RA-227' 42.2 'M' 100 2.4 500 0 0 0
'RA-228' 5.76 'Y' 36 0.8 150 0 0 0
'RA-229' 4 'M' 0 0 0 0 0 0
'AC-227' 21.773 'Y' 890 5 1660 0 0 0
'AC-228' 6.13 'H' 150 1 200 0 0 0
'AC-229' 1.04 'H' 750 4.9 1000 0 0 0
'TH-227' 18.718 'D' 0 0 0 202 0 0
'TH-228' 1.9131 'Y' 123 4.9 1014 0 0 0
>>
Of course, as I've lamented for Lo! these 30+ years, it would have been SO much easier if TMW had retained the Fortran FORMAT style for formatted i/o instead of the C-style that doesn't understand fixed-width files. Then could have just written the proper FORMAT statement and been done with it instead of having to fight through the complexity of the options structure. But, at least they have finally implemented something that, however complex and bulky, at least does work.
NB: You MUST create the empty object; for some reason the one that is created by default if one uses
opt=detectImportOptions('PROPERTY.txt');
does NOT include the .VariableWidths property and it will NOT let you add it to that structure. That appears to be a bug to me or at least a serious quality of implementation issue. I'll submit a SRQ on it.
  5 comentarios
dpb
dpb el 11 de Oct. de 2018
MATLAB syntax borrows mostly from Fortran; the earliest versions were, in fact, written in FORTRAN--one-based arrays, the style of FOR...NEXT mimics DO...CONTINUE, etc., etc., ... Other than throwing away GOTO it's pretty straightforward from one to the other. Unfortunately, all most people know about Fortran is hearsay and mostly inaccurate and 40+ years out of date as to the current state of modern Fortran.
That aside, it's been 30+ years with MATLAB before there was any way at all provided by TMW to handle fixed-width text files and heaven only knows how many hours wasted by users trying to read such that having had the simple expedient of a drop out into a FORMATted Fortran READ() would have solved in an instant.
/RANTWARN Ages ago I built a simple version via a mex function and submitted it to TMW but they never picked up on it. Unfortunately, the source got lost back when I left the consulting gig and didn't realize I didn't have a version on personal machine at home. I've never been able to get a Fortran compiler to work with recent versions of ML; they no longer support anything other than Intel and the mex setup interface is such an abomination I've just totally given up trying to get anything else to work. /ENDRANT
dpb
dpb el 11 de Oct. de 2018
Editada: dpb el 11 de Oct. de 2018
I intended to get back to this earlier but...
I noticed before there are NaN values returned for some of the half-lifes; this is a bug in that readtable didn't parse the floating-pt field with just a plus sign for the exponent field.
There is an option in the tabulartextdatastore object to define an exponent character but that appears to be missing with the FixedWidthImportOptions object. But, tabulartextdatastore doesn't know anything about fixed-width fields, so there's still a hole in the import space.
Thus, it appears will have to read that field as char and parse it separately.
NB: However, the format without a letter for the exponent is an extension of the Fortran Standard specification for the E FORMAT descriptor so, albeit a fairly common extension to interpret the string as if it were there, it is fair to say it isn't supported. I'm sure the C Standard has similar restriction.
However #2, is that there is also a blank between the last significant digit of the mantissa and the exponent sign character in some instances--this is also a malformed input form.

Iniciar sesión para comentar.

Categorías

Más información sobre Fortran with MATLAB en Help Center y File Exchange.

Productos


Versión

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by