Get data points from one line

1 view (last 30 days)
Hello
I have some data in the form
...
-> Parameter number 54 : Cell_C_ph1_pat1 4.4501491 ( +/- 0.72288327E-04 )
-> Parameter number 55 : Cell_A_ph1_pat1 11.445057 ( +/- 0.16855453E-03 )
-> Parameter number 56 : Cell_B_ph1_pat1 4.1313801 ( +/- 0.61447019E-04 )
-> Parameter number 57 : X-tan_ph1_pat1 0.33901680 ( +/- 0.41584419E-02 )
-> Parameter number 58 : V-Cagl_ph1_pat1 -0.20550521E-01( +/- 0.47112759E-02 )
-> Parameter number 59 : W-Cagl_ph1_pat1 0.20377478E-02( +/- 0.27476726E-03 )
-> Parameter number 60 : U-Cagl_ph1_pat1 0.18869112 ( +/- 0.19129461E-01 )
...
I'm trying to get the values after the name of the parameter(i.e. the 4.4501491 +/- 0.72288327E-04 at Cell_C_ph1_pat1), but i'm struggling a little with the regexp/strfind. At the moment, i have something looking like
buffer = fileread('SnSe_100K_17.out');
substr = '(?<=Cell_C_ph1_pat1\D*)\d*\.?\d+';
numbers = str2double(regexp(buffer,substr,'match'))
This just give the first value of Cell_C(4.4501491), and i would love to also get the error. Actually, if it's possible to get three vectors out - one with the name of the parameter, one with values, and one with errors, it would just be perfect!
I have a lot of data-files in one folder, so i think i would want to make a for-loop, getting all the data from the other files.

Accepted Answer

Cedric Wannaz
Cedric Wannaz on 13 Oct 2017
Edited: Cedric Wannaz on 13 Oct 2017
That was a good attempt, but for this you should use tokens:
buffer = fileread( 'SnSe_100K_17.out' ) ;
pattern = ':\s+(\S+)\s+([^\(]+)\D+(\S+)' ;
tokens = regexp( buffer, pattern, 'tokens' ) ;
tokens = vertcat( tokens{:} ) ;
names = tokens(:,1) ;
data = str2double( tokens(:,2:3) ) ;
Using tokens is roughly the same as matching the normal way, you define a pattern that matches the full block that contains what you need to extract, but then you frame the parts that you want to be extracted specifically within () in the pattern. These parts are called tokens.
The pattern that I built here is based on the following observations:
  • Parameter names are always separated from their surrounding by white spaces and they do not contain white spaces, hence the \S+ to match them.
  • Values cannot be matched the same way, because there are cases where they touch the opening parenthesis associated with the error, so we can match them using [^\(]+ (one or more char that is not an opening parenthesis).
  • What is within values and errors contain no numeric digit and symmetrical errors don't need a sign, so we can "eat the string" after the value as long as characters are not numeric digits, hence the \D+.
  • Errors seem to be followed by a white space, so \S+ to get them.
With that, you get:
>> names
names =
7×1 cell array
{'Cell_C_ph1_pat1'}
{'Cell_A_ph1_pat1'}
{'Cell_B_ph1_pat1'}
{'X-tan_ph1_pat1' }
{'V-Cagl_ph1_pat1'}
{'W-Cagl_ph1_pat1'}
{'U-Cagl_ph1_pat1'}
>> data
data =
4.4501 0.0001
11.4451 0.0002
4.1314 0.0001
0.3390 0.0042
-0.0206 0.0047
0.0020 0.0003
0.1887 0.0191
  2 Comments
Cedric Wannaz
Cedric Wannaz on 13 Oct 2017
Add a space before the : in the pattern .. not that annoying, I've seen worse ;)
pattern = ' :\s+(\S+)\s+([^\(]+)\D+(\S+)' ;
If parameter numbers are right justified there will be no problem. If not, we just have to capture a white space or a numeric digit before the column:
pattern = '[\d\s]:\s+(\S+)\s+([^\(]+)\D+(\S+)' ;

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by