reading text from various positions

Question

Franz Kohlhus el 27 de Sept. de 2016

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/304755-reading-text-from-various-positions

Comentada: dpb el 4 de Oct. de 2016

I have a .txt with floats and strings I want to import. The text was created by a software logging all events within an experiment (by time), organized in trials, so it looks a bit like:

+++ LogStart1 +++
procedure = org
List = 45
Condition = 43
DelayOnset = 1
[and a lot more of variables]
+++ LogEnd +++
+++ LogStart2 +++
[and the same for the second trial and so on...]

I would like to get a vector of all values for each variable (e.g., "DelayOnset"). I started importing data with fscanf, lets say, "DelayOnset" was the third string in the file:

filename = 'data.txt';
fid = fopen(filename); 
formatSpecDELAY = '%*s %*s DelayOnset=%f'; [here is the object I want, so skip a string and skip another string, and then the position is after "DelayOnset="]
DelayOnset= fscanf(fid,formatSpecDELAY, *1*);

This worked out well, but if I can't do this for each variable, because as the file contains 1000+ lines, I would have to skip each object before the actual values I want to read, that is, I would have to write %*s a thousand times. Initially I thought, if I don't limit the number of objects (1 in the example above), I get every value for delay in the file ("search for every "DelayOnset=" and return the float which follows"), but that was not the case. In fact, I had to skip all the objects between DelayOnset in the first trial and DelayOnset in the second trial in order to get a vector of both values. I can't do this for the whole file.

Is it possible to create several points of reference within the text file, in order to start fscanf from these points?

Thank you very much in advance!

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

dpb el 27 de Sept. de 2016

Think we need to see at least some more of the file and a more specific description of what you're trying to read. Specifically, is there simply a variable and its associated value within each section or is there an array of values for a variable over the time of a trial or is the variable name duplicated for every event or...???

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

KSSV el 28 de Sept. de 2016

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/304755-reading-text-from-various-positions#answer_236407

Editada: KSSV el 28 de Sept. de 2016

Abrir en MATLAB Online

You can use textscan and copy all the text file data into a cell, and then find your required string.

clc; clear all ;
fid = fopen('data.txt') ;   % your text file in data.txt 
S = textscan(fid,'%s','delimiter','\n') ;  % scan the text file 
fclose(fid) ;
S = S{1} ;   
idx = strfind(S,'DelayOnset') ;  % find your string from cell arrays 
idx = find(not(cellfun('isempty', idx)));   % remove empty cells 
S{idx}   % your required information

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

dpb el 28 de Sept. de 2016

Editada: dpb el 28 de Sept. de 2016

Ah...my old eyes had glossed over where the issue was so I just wrote a new solution from scratch--but now I see the problem.

Don't convert the cellstr array S to char array until after the string search so will return only the cells containing the desired string. Then, process that subset in character string form to extract the numeric data.

Or, you can get there this way as well, at this point however, S(response) is the subset of lines containing the text; you've still got to then read the numeric values from that array. S{response} otoh is a comma-separated list of values; that's not so easy to deal with for the purpose which is why I'd keep it as cellstr array until ready to do the conversion.

But, you're missing the part of using textscan on the subset of the overall file to read the data values at the above point; you've isolated the proper lines but not yet parsed them.

Or, of course, regexp could be made to do this, too, but I'm such a klutz with its syntax I'll leave that to the whizards of that arena... :)

I think my solution while as noted is very close to this is somewhat simpler in its sequence of operations and does do the last step as well...

KSSV el 29 de Sept. de 2016

Abrir en MATLAB Online

Hello Once you got the indices of your required string, you can easily extract the number from the string. Can't you?

Try:

clc; clear all ;
fid = fopen('data.txt') ;   % your text file in data.txt 
S = textscan(fid,'%s','delimiter','\n') ;  % scan the text file 
fclose(fid) ;
S = S{1} ;   
response = strfind(S,'response') ;  % find your string from cell arrays 
response = find(not(cellfun('isempty', response)));   % remove empty cells 
S{response}   % your required information if true  % code end
iwant = zeros(length(response),1) ;
for i = 1:length(response)
    tmp = regexp(S{response(i)},'\d*','Match');
    iwant(i) = str2num(tmp{1}) ;
end

Iniciar sesión para comentar.

Answer 2

dpb el 28 de Sept. de 2016

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/304755-reading-text-from-various-positions#answer_236531

Editada: dpb el 29 de Sept. de 2016

Abrir en MATLAB Online

Essentially other respondent's solution with a few shortcuts along the way of not building the intermediaries primarily...although used textread to first scan the file as it saves the fopen/fclose hoopla when don't need the extra facilities of textscan (such as to scan a string in memory as later on). Built it to read whatever variable of this form in the file you're interested in by simply changing the STR variable. The only other real trick is note the transpose .' on the output of the conversion of the cellstr array found in the cast to char which is needed as textscan isn't cellstring literate. This is necessary as memory is column-major in Matlab so to scan the string must orient it so that the lines are essentially columns to be read. Otherwise, one must loop through record-by-record.

ADDENDUM

OK, to deal with the multiple records case, what I'd envision would be sotoo:

STRS={'Procedure','response'};
fmts={'%s','%f'};
s=textread('koh.txt','%s','delimiter','\n','whitespace','');
for i=1:length(STRS)          % loop over the number to read...
  fmt=[STRS{i} '= ' fmts{i}]; % build the format string
  if strcmp(fmts(i),'%s')     % ok do need know which type variable reading
    txt=cellfun(@(x) sscanf(x,fmt),s(~cellfun(@isempty,strfind(s,STRS{i}))),'uniformoutput',0);
  else
    data=sscanf(char(s(~cellfun(@isempty,strfind(s,STRS{i})))).',fmt);
  end
end

The above should need only a little extra bookkeeping to add multiple data sets and text info by creating arrays for the outputs.

At command line here after having read the data file--there was also a missing set of curlies to dereference the cellstr in the first strfind call and the closing end on the for loop, but that's the sort of thing one can expect from "air code"...I made those corrections above, as well...

>> for i=1:length(STRS)           % loop over the number to read...
     fmt=[STRS{i} '= ' fmts{i}];  % build the format string
     if strcmp(fmts(i),'%s')      % ok do need know which type variable reading
       txt=cellfun(@(x) sscanf(x,fmt),s(~cellfun(@isempty,strfind(s,STRS{i}))),'uniformoutput',0);
     else
       data=sscanf(char(s(~cellfun(@isempty,strfind(s,STRS{i})))).',fmt);
     end
   end
>> txt
txt = 
    'left'
    'left'
>> data
data =
     3
     5
>>

As for using the results, I've noted I don't have the table class but something similar in the Statistics Toolbox is the dataset. I'm not advocate you use it instead, but to illustrate the type of thing it does,

>> ds=dataset(txt,data,'VarNames',STRS)
ds = 
    Procedure     response
    'left'        3   
    'left'        5   
>>

Now there's a composite data object with both variables you can address for analysis, etc., programmatically generically rather than with multiple variables and the like. The builtin table has all the dataset features and more...

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Franz Kohlhus el 2 de Oct. de 2016

Thanks again!

At the moment, I am trying to expand the amount of variables read out like txt and data and creating a dataset out of them. However, if I just understood your code right, the if-loop depends on whether fmts is a string or not (which would make it the second variable). Hence, the loop depends on the two different kinds of variables. So, in what way would it be different, if they all of the same kind, lets say, %f and if there were a lot more of them (I guess we would be working with 'if ... elseif ... elseif ... else'?

dpb el 2 de Oct. de 2016

Editada: dpb el 3 de Oct. de 2016

Abrir en MATLAB Online

All that should be needed in the outline above is to simply list the variables in the STRS array and their corresponding format in the fmts array--that's why I did it that way. Any valid numeric string can be scanned with '%f' on input; the string tokens are the "odd man out" that need the different format for sscanf. You therefore only need two formats, you just need to know a priori which one goes with which variable. Or, of course, one can go to more effort in coding with try..catch blocks or the like to dynamically ascertain the type but with a relatively few items it seemed simpler to just enumerate 'em and go on...

But, as nice as it is to solve a problem for someone, look at Guillaume's solution--it returns all the token pairs automagically and all you're left with is selecting the ones of interest by name. That's pretty nice presuming you have recent-enough release of Matlab for the collection class to exist.

ADDENDUM

Also note that the complexity does grow somewhat more with the explicit solution; you'll have to either build a cell array of the results during the loop or build the dataset or table object during the loop or the subsequent passes thru the loop will overwrite the txt, data variables on the next pass, leaving you with only the last of each type after the loop without doing something about that. That was my previous comment...

A simplistic solution is to write

data=[];  % before the loop
for...
  ...
  data=[data sscanf(... ];

that will append the later set onto the first. This, however will require every set be the same length. You could create a column vector instead but then if they're not the same length you have a problem knowing which belongs to which variable. A cell array would work, but that means keeping an index to increment for each type. Doable certainly, but not, probably, the best solution given the expanded wishes and likely not the way I'd've started if you'd asked the more general question to begin with.

Iniciar sesión para comentar.

Answer 3

Franz Kohlhus el 29 de Sept. de 2016

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/304755-reading-text-from-various-positions#answer_236605

Abrir en MATLAB Online

Thank you very much, that worked! However, there is a related problem if I use this code on my data: Starting, again from this data extract

 Experiment= Tech
Subject= 1
Session= 1
Display.RefreshRate= 75.002
Level= 2
+++ log start +++
Niveau= IO
+++ Start +++
Procedure= *left*
Target= steel
response= 3
+++ log start +++
Niveau= IO
+++ Start +++
Procedure= *right*
Target= steel
response= 5
  end

But now, in the first trial, the procedure is 'left' and in the second trial, the procedure is 'right'. Still and as above, I want to read the variables ('response') into a vector, but depending on the procedure, that is, if the procedure in one trial is 'left' I need a vector e.g. RESPONSE_LEFT and if the procedure is 'right' I need a different (RESPONSE_RIGHT). I think I know the tools (if-loop) and tried a lot but I still don't get the script working on that. Thanks!

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

dpb el 29 de Sept. de 2016

Editada: dpb el 29 de Sept. de 2016

Abrir en MATLAB Online

Repeat the logic I provided changing the STR value to 'Procedure' and the fmt string to '%s' instead of '%f' on the array s read by textread. Those results will be in the same order as the numerical results so simply concatenate the two answers.

Alternatively, revert to a loop, but there's no need.

You could also use ismember to search for the combination of lines but it'll return the mixed bag of results intermixed with character and numeric answers to be scanned so for just the two pieces I'd do the above.

ADDENDUM Turns out textscan has difficulty scanning in memory for the string; wasn't worth the effort to try to dig through what precisely is its confusion; not having record markers is likely culprit, though, I think.

Anyway, just use sscanf and cellfun again instead...remember, this is after having read the full file content into cell array s initially:

>> STR2='Procedure';
>> fmt2=[STR2 '= %s'];
>> cellfun(@(x) sscanf(x,fmt2),s(~cellfun(@isempty,strfind(s,STR2))), ...
                   'uniformoutput',0)
ans = 
  'left'
  'left'
>>

Now since you've got a cell containing the one and the double containing the other piece of information, you'll have to either mush together into a cell array or convert the procedure to an integer value code--oh, newer releases have the table that will handle the disparate types but I can't show that as have R2012b here which predates it...

Anyways, it's pretty simple to just extend with a couple nits...

Franz Kohlhus el 29 de Sept. de 2016

Thank you, but I am afraid I didn't describe the problem well. I still want to read the variable RESPONSE (that is, the value behind response=) into a vector, but depending on whether we are in a trial with the proedure = right or procedure = left. In other words, if procedure = left, read the response vector that folllows within XY lines into RESPONSE_LEFT and vice versa (so that in the end, there are two vectors of the response variable!) I guess for that, we need a loop, right?

dpb el 29 de Sept. de 2016

Editada: dpb el 29 de Sept. de 2016

No! Just string the two pieces together to parse the two record types--but you only need to read the file once. The only reason for a loop might be to place a set of STR and fmt values in an array for the number of record types to be processed and iterate over that generically rather than using two variables (or reusing the same ones would also work, of course) as I did just for demo purposes.

There isn't really any need to make to response variables; just use the other as the grouping variable. You could, of course, create the two by separating them out by using the indicator variable to select, but it would seem likely that it's just as easy or even, perhaps, easier to simply have one response variable, not two. Particularly with, as noted, the facilities built into the table class.

Iniciar sesión para comentar.

Answer 4

Guillaume el 29 de Sept. de 2016

0
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/304755-reading-text-from-various-positions#answer_236692

Editada: Guillaume el 29 de Sept. de 2016

Abrir en MATLAB Online

Not having read the other answers, here is how I would deal with your problem:

filecontent = fileread('C:\somewhere\Yourfile');  %read whole content of file at once
keyvaluepairs = regexp(filecontent, '([^=\n\r]*)= ([^=\n\r]*)', 'tokens');  %identify all key values pairs (any strings separated by '= '
keyvaluepairs = vertcat(keyvaluepairs{:});  %transform cell array of cell array in two column cell array
[keys, ~, rows] = unique(keyvaluepairs(:, 1)); %get unique keys and corresponding rows
values = accumarray(rows, (1:numel(rows))', [], @(ridx) {keyvaluepairs(ridx, 2)});  %group together all values for each key
mymap = Containers.Map(keys, values);  %store it into a map for easy querying

Querying for any key is then straightforward, e.g:

mymap('response')

13 comentarios
Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos

Guillaume el 4 de Oct. de 2016

Abrir en MATLAB Online

You an silent the warning with:

warning('off', 'MATLAB:iofun:UnsupportedEncoding');

However, I would leave it on as a reminder that you're using an undocumented and unsupported option that may break / disappear in future releases. Matlab does not officially support UTF-16.

dpb suggestion would work if you read the file normally (e.g. with fileread) and iif all the characters in the file have code < 256 (not guaranteed if there's some non US-english characters). You would do the filtering immediately after reading the file content.

dpb el 4 de Oct. de 2016

@Franz--Guillaume's comment has merit but warnings all the time are annoying so if the application that builds the raw data files does use UTF16 and you can't easily change that, personally I'd turn off the warning and make a comment in the m-file about what the issue is.

While it is unsupported in other areas at least so far, I don't see that TMW can possibly regress in removing at least minimal support and gradually increasing other support in Matlab--the encoding isn't going to go away and they'll just be left further and further behind if were to do so.

As he also notes, the "fixup" does work as long as character codes are within the lower 8bit UTF8 character set which it appears from the type of file is likely the case....but certainly not guaranteed. Of course, given the limited support elsewhere, if you find some in a file you'll possibly have other difficulties arise anyway that you'll have to work around.

Iniciar sesión para comentar.

reading text from various positions

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuestas (4)

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

13 comentarios
Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

reading text from various positions

1 comentario Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

Respuestas (4)

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

3 comentarios Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

13 comentarios Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos

Ver también

Categorías

Etiquetas

Community Treasure Hunt

1 comentario
Mostrar -1 comentarios más antiguosOcultar -1 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

3 comentarios
Mostrar 1 comentario más antiguoOcultar 1 comentario más antiguo

13 comentarios
Mostrar 11 comentarios más antiguosOcultar 11 comentarios más antiguos