Importing file with text and numbers
Mostrar comentarios más antiguos
Hi,
I'm trying to load a text file that contains both text and numbers into matlab. The first few lines of the text are shown below:
<<Time = 0.0494352
Patch: waterFlow found on 1/1 processor(s)
Flux at waterFlow = -0.0125m^3/s [-750 l/min]
Patch: airFlowIn found on 1/1 processor(s)
Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]
Patch: outlet found on 1/1 processor(s)
Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]
Time = 0.0496235
Patch: waterFlow found on 1/1 processor(s)
Flux at waterFlow = -0.0125m^3/s [-750 l/min]
Patch: airFlowIn found on 1/1 processor(s)
Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]
Patch: outlet found on 1/1 processor(s)
Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]
Time = 0.0498117
Patch: waterFlow found on 1/1 processor(s)
Flux at waterFlow = -0.0125m^3/s [-750 l/min]
Patch: airFlowIn found on 1/1 processor(s)
Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]
Patch: outlet found on 1/1 processor(s)
Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]>>
This is a very long file where the data is given at each time step. I need to sort the time and the values for the fluxes. I tried textscan but it was unsuccessful.
I really appreciate any ideas and suggestions.
Thanks \Hale
2 comentarios
dpb
el 6 de Jul. de 2013
Is the blank line between data records real or a figment of the cut'n paste operation?
Respuesta aceptada
Más respuestas (3)
the cyclist
el 6 de Jul. de 2013
0 votos
If you have a relatively recent release of MATLAB, you can use the Import Data tool that is found on the Home tab of the Command Window.
You can read about it (and all kinds of other options for importing data) here:
Miroslav Balda
el 6 de Jul. de 2013
0 votos
The prwvious answer gives a possible solution, however the function fgetl is rather slow. Maybe, the alternative way is in application of the function
ffread www.mathworks.com/matlabcentral/fileexchange/9034
The function serves for free-format reading of ascii files. The read lines can be analyzed after the file is read. Good luck.
Mira
1 comentario
dpb
el 7 de Jul. de 2013
I'm not sure what that particular FEX submission actually does, but one can read the whole file in one big slurp (assuming will all fit in memory) w/ fread() as character array and then only loop thru the records in memory is desired.
per isakson
el 6 de Jul. de 2013
Editada: per isakson
el 6 de Jul. de 2013
If the file fits in memory this is one way to read it.
Maybe, '\r\n', needs to be replaced by '\n'. That depends the source of the file. Or replace '\r\n' by '[\r]*\n' to handle both cases with the same code.
Next step is to decide what data shall be kept and in what data structures.
Replace disp( ca2{jj} ) by code that parses one line at a time. See dpb's answer.
Try
function cssm()
str = fileread( 'blocks.txt' );
ca1 = regexp( str, '\r\n(?=Time)', 'split' );
len = length( ca1 );
% use len to allocate memory for variables to store data.
for ii = 1 : length( ca1 )
ca2 = regexp( ca1{ii}, '\r\n', 'split' );
for jj = 1 : length( ca2 )
disp( ca2{jj} )
end
end
end
returns
Time = 0.0494352
Patch: waterFlow found on 1/1 processor(s)
Flux at waterFlow = -0.0125m^3/s [-750 l/min]
Patch: airFlowIn found on 1/1 processor(s)
Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]
Patch: outlet found on 1/1 processor(s)
Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]
Time = 0.0496235
Patch: waterFlow found on 1/1 processor(s)
Flux at waterFlow = -0.0125m^3/s [-750 l/min]
Patch: airFlowIn found on 1/1 processor(s)
Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]
Patch: outlet found on 1/1 processor(s)
Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]
Time = 0.0498117
Patch: waterFlow found on 1/1 processor(s)
Flux at waterFlow = -0.0125m^3/s [-750 l/min]
Patch: airFlowIn found on 1/1 processor(s)
Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]
Patch: outlet found on 1/1 processor(s)
Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]
5 comentarios
dpb
el 7 de Jul. de 2013
If open the file w/ the 'rt' option, then \n should be right for the platform (assuming the file was written on the same platform as Hale is trying to process it on) and shouldn't need to worry about \r\n, etc. ('t' doesn't make any difference on Unix-like OS but is important on Windows. Might as well just use it 'cuz it does "the right stuff" for the platform.)
per isakson
el 7 de Jul. de 2013
Editada: per isakson
el 7 de Jul. de 2013
"assuming the file was written on the same platform". Yes, but text files are often used to transfer data between different platforms, because text is the simplest format that is platform "independent".
dpb
el 7 de Jul. de 2013
Yes, but high fraction of cases will be same platform and OP will know...and if it is, the 't' makes dealing w/ the \n transparent. May as well start w/ the trivial case and worry about the odd one if it exists.
$0.02, imo, ymmv, etc., etc., etc., ...
per isakson
el 8 de Jul. de 2013
Editada: per isakson
el 8 de Jul. de 2013
The answers to a question will ideally provide a little "smorgasbord". I offer one small dish, without too much thought.
I hope that more than one reader will benefit from the "smorgasbord".
The doc of R2012a says:
[...]To open files in text mode, attach the letter 't' to the permission,
such as 'rt' or 'wt+'. For better performance, do not use text mode.[...]
A long time ago I ceased using the 't' because of the performance penalty. I've kind of forgotten that it exists.
Hmmm...R2012b (doc) says
To open files in text mode, attach the letter 't' to the permission, such as 'rt' or 'wt+'.
For better performance, do not use text mode. The following applies on Windows systems, in text mode: ...
This additional processing is unnecessary for most cases. All MATLAB import functions, and most text editors (including Microsoft Word and WordPad), recognize both '\r\n' and '\n' as newline sequences. However, when you create files for use in Microsoft Notepad, end each line with '\r\n'. ...
I have only recently been blessed by TMW w/ an update to 2012b (from R12) which doesn't have anything specific about the performance hit and has the warning
... To open in text mode, add "t" to the permission string, for example 'rt' and 'wt+'. (On Unix, text and binary mode are the same so this has no effect. But on PC systems this is critical.)
I'm of the age when it was indeed the case that much Windows software including my favorite programmers' editor didn't deal w/ the non-Windows \n sequence at all gracefully so I just continue to operate in that mode.
I guess I'll have to update my thinking/advice for Matlab specifically and let users run into their own quirks w/ other packages if they still aren't graceful.
I do see that TMW ought then to update the help text for fopen() to be more consistent as it still has the same verbiage as does R12.1 and no real indication of any real performance hit.
From R2012b session...
MATL
>> help fopen
fopen Open file.
...
You can open files in binary mode (the default) or in text mode.
In binary mode, no characters get singled out for special treatment.
In text mode on the PC, the carriage return character preceding
a newline character is deleted on input and added before the newline
character on output. To open a file in text mode, append 't' to the
permission string, for example 'rt' and 'w+t'. (On Unix, text and
binary mode are the same, so this has no effect. On PC systems
this is critical.)
So, I'll modify my warnings if TMW will fix help... :)
Categorías
Más información sobre Data Type Conversion en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!