TextScan - flexible formatSpec string how to?

Hey fellow "Matlabers"!
I have been struggling with this problem for quite a while and so far could not figure out a solution. Maybe someone here knows or have a suggestion.
The problem:
I am reading several .txt files, which contain all kind of data (strings, date format, numbers) and I only need to extract information of a few columns. The problem is that I need to ignore a certain amount of characters (marked as string) until I reach the first column that has data I need. For each file, the amount of characters can vary and therefore, I don't know how to specify on the formatSpec string that will be used in my textscan function. The number 59 is the value that varies; each file has a different number of characters to discard.
Example:
formatSpec = '%*59*s%10{dd/MM/yyyy}D%6{HH:mm}D%*10*s%*14s%10s%*8*s%*14s%10s%[^\n\r]';
textscan(fileID, '%[^\n\r]', startRow-1, 'ReturnOnError', false);
dataArray = textscan(fileID, formatSpec, 'Delimiter', '', 'WhiteSpace', '', 'ReturnOnError', false);
Error message:
Error using textscan
Unable to read the DATATIME data with the format 'dd/MM/yyyy'. If the data is not a time, use %q to get
string data.
Any idea how can I automate this process?
Thanks in advance!
EDIT: I have added two .txt files as an example of what kind of data I am dealing with.

2 comentarios

Stephen23
Stephen23 el 20 de Mayo de 2015
Your description is great. All that is missing are a few sample text files, so that we can test out code on and see if it works. You can upload a few test files using the paperclip button, and not that you will need to push both the Choose file and Attach file buttons too.
It is much easier for us and also for you if we have real data to work with!
BSantos
BSantos el 20 de Mayo de 2015
Stephen,
I thought about adding my files, but I'm afraid I can't due to company restrictions. I will try to "edit" my txt files and leave out just some information so I don't get in troubles.
Thanks!

Iniciar sesión para comentar.

Respuestas (1)

Walter Roberson
Walter Roberson el 20 de Mayo de 2015
ToSkip = 59;
formatSpec = ['%*', sprintf('%d', ToSkip), '*s%10{dd/MM/yyyy}D%6{HH:mm}D%*10*s%*14s%10s%*8*s%*14s%10s%[^\n\r]';

5 comentarios

BSantos
BSantos el 20 de Mayo de 2015
Dear Walter,
Thanks for your answer, but the problem is that, I don't know what number will be. It can be 59 or any other. How can I find that number in advance?
Walter Roberson
Walter Roberson el 20 de Mayo de 2015
Then you can't ignore those characters; you need to pay attention to them to figure out where the interesting data starts.
What distinguishing feature is there for the place you do want the scanning to start?
BSantos
BSantos el 20 de Mayo de 2015
My data should start with "datatime" kind of date, meaning it's a number in a format dd/MM/yyyy. But before the Date/Time column there are other columns with numbers as well.
Any idea on how to figure out the amount of characters to discard before scanning start?
thanks!
The two sample files you provided can be handled by using
repmat('%*s',1,5)
as the data to skip.
By the way, are the columns possibly tab separated?
BSantos
BSantos el 20 de Mayo de 2015
Editada: BSantos el 20 de Mayo de 2015
Walter,
Thanks, I will try this out on my script. Unfortunately no; the software generating this .txt files is a bit "user unfriendly"... The csv files are even a lot worse to handle than the text files; so I choose saving my results in this kind of text files.
If this works, I will pots over here.
EDIT:
Well I get the same error as posted on my question. Any other suggestions?
Thanks!!

Iniciar sesión para comentar.

Productos

Preguntada:

el 20 de Mayo de 2015

Editada:

el 20 de Mayo de 2015

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by