Ignoring header/footer in textfile question
5 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hello,
For the past week Ive been trying to open multiple text files that have different headers/footers at the same time. And ignoring all headers/footers and just extracting the data.Without knowing what the headers/footers are.The only thing I know is that the headers/footers always start with a char and form a string.
All headers/footers start with a char, examples:
File 1:
Line 1 of file - Samplerate : 100000
Line 2 of file - Bitspersample: 12
Rest of lines - data(2000 samples,floats)
File 2:
Line 1 of file - Bitspersample: 32
Line 2 of file - Normalized: FALSE
Lines 3-2500 - data(2500 samples,floats)
Line 2501 of file - Channel: A
Is there a way to ignore all lines of a text file that start with a char/string?
0 comentarios
Respuestas (1)
Walter Roberson
el 29 de En. de 2020
fileread() the file.
regexprep() pattern '^\s*[^0-9+.-].*$' replacement '' (the empty string) with 'lineanchors' option. This will zap the content of lines whose first non-whitespace character is not a digit or + or - or period. If your data never has leading + on the numbers then do not include the + in the pattern. If your data never has numbers that start with period without leading 0 then do not include period in the pattern. This is the question of whether a number like .5 can occur or if would be 0.5.
In the case where your data never has leading + or - or period then instead of the pattern I showed, you can use '\s*\D.*$'
After the regexprep, textscan() the string.
2 comentarios
Walter Roberson
el 29 de En. de 2020
regexprep(str, '^\s*[^0-9+-].*$', '', 'lineanchors', 'dotexceptnewline')
[] means aany one character chosen from the list inside of the [] except when the first thing inside the [] is ^ in which case it means any one character that is NOT one of the listed ones. So the construct matches any one character that is NOT 0123456789 or + or - . In short you are looking for lines in which the first nonblank character is something that cannot possibly be forming a number.
The .* after that with the dotexceptnewline option matches to the end of the same line. When you find such a line you replace it with emptiness (but without removing the newline character itself) so you get an empty line in place of any line that starts with a non-number
Ver también
Categorías
Más información sobre Large Files and Big Data en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!