Ignoring header/footer in textfile question

m j
m j on 29 Jan 2020
Commented: Walter Roberson on 29 Jan 2020
For the past week Ive been trying to open multiple text files that have different headers/footers at the same time. And ignoring all headers/footers and just extracting the data.Without knowing what the headers/footers are.The only thing I know is that the headers/footers always start with a char and form a string.
All headers/footers start with a char, examples:
File 1:
Line 1 of file - Samplerate : 100000
Line 2 of file - Bitspersample: 12
Rest of lines - data(2000 samples,floats)
File 2:
Line 1 of file - Bitspersample: 32
Line 2 of file - Normalized: FALSE
Lines 3-2500 - data(2500 samples,floats)
Line 2501 of file - Channel: A
Is there a way to ignore all lines of a text file that start with a char/string?


Answers (1)

Walter Roberson
Walter Roberson on 29 Jan 2020
fileread() the file.
regexprep() pattern '^\s*[^0-9+.-].*$' replacement '' (the empty string) with 'lineanchors' option. This will zap the content of lines whose first non-whitespace character is not a digit or + or - or period. If your data never has leading + on the numbers then do not include the + in the pattern. If your data never has numbers that start with period without leading 0 then do not include period in the pattern. This is the question of whether a number like .5 can occur or if would be 0.5.
In the case where your data never has leading + or - or period then instead of the pattern I showed, you can use '\s*\D.*$'
After the regexprep, textscan() the string.


m j
m j on 29 Jan 2020
Thanks, but im having a hardtime deciphering regexprep. I can see why you have a ^ and $ for 'lineanchors' option. But im having trouble finding out how you get the rest inbetween it....^\s*[^0-9+.-].*$' . Theres a explanation for \s but I cant seem to find explanation for rest,ie *[^0-9+.-].*.....
Also newStr = regexprep(str,expression,replace). As I understand str would be from fileread,and expression would be '^\s*[^0-9+.-].*$', and replace is 'lineanchors'? Am I understanding this correctly,sorry if not. English isnt my first laungauge. Plus im not at my PC,when I get home ill try.
And my data will have both + and - floats that always have a leading 0. example: -0.012,0.44,5.44,etc...
Walter Roberson
Walter Roberson on 29 Jan 2020
regexprep(str, '^\s*[^0-9+-].*$', '', 'lineanchors', 'dotexceptnewline')
[] means aany one character chosen from the list inside of the [] except when the first thing inside the [] is ^ in which case it means any one character that is NOT one of the listed ones. So the construct matches any one character that is NOT 0123456789 or + or - . In short you are looking for lines in which the first nonblank character is something that cannot possibly be forming a number.
The .* after that with the dotexceptnewline option matches to the end of the same line. When you find such a line you replace it with emptiness (but without removing the newline character itself) so you get an empty line in place of any line that starts with a non-number

