Help using the arrayfun() function to apply strsplit() to all entries in a string array

I'm trying to wrap my head around how the arrayfun() function works and would greatly appreciate some help with a specific example:
I have a string array of weather data.
weather_strings =
10×1 string array
"UTC,2140991,49.0"
"UTC,2140992,49.1"
"UTC,2140993,49.1"
...
I need to extract the values after the second comma (temperatures) as a 1x10 matrix of doubles, [49.0, 49.1, 49.1, ...].
I've figured out a clunky way to do this for a single entry (please let me know if there's a better way).
weather_string = weather_strings(1) % extract only the first entry
weather_string_split = strsplit(weather_string, ',') % apply strsplit() to split on commas
weather_string_split_trim = weather_string_split(:,3) % extract only 3rd column
weather_num_trim = str2num(weather_string_split_trim) % convert from string to double
But I can't seem to figure out how to use arrayfun() to apply that to every entry. I've tried:
weather_strings_split = arrayfun(strsplit(weather_strings,','), weather_strings) % apply stringsplit to split on commas, for all elements?
which gives the error message:
Error using strsplit (line 80)
First input must be either a character vector or a string scalar.
Error in test_window (line 17)
weather_strings_split = arrayfun(strsplit(weather_strings,','), weather_strings)
I'm probably missing something painfully obvious. What is it? I'm still somewhat of a beginner at coding, so I welcome you to explain it to me like I'm 5 years old.
Alternatively, if there's a clever way to extract these numbers directly from this data table (which came directly from a webread() function), I'd love to hear it. Var3 is a cell array.
weather_data_table =
10×3 table
Var1 Var2 Var3
__________ ________ __________________
2018-11-26 17:41:25 'UTC,2140991,49.0'
2018-11-26 17:42:27 'UTC,2140992,49.1'
2018-11-26 17:43:28 'UTC,2140993,49.1'
...
Again, the goal is to get just the last numbers after the second comma of Var3 into a 1D matrix.
Thanks in advance!

 Respuesta aceptada

Try this:
for k1 = 1:size(weather_strings,1)
Col3(k1,:) = str2double(regexp(weather_strings{k1}, '\d*\.\d*', 'match'));
end
Col3 =
49.0000
49.1000
49.1000
The loop is necessary because regexp is not vectorised. It can only handle one srting at a time.

6 comentarios

"The loop is necessary because regexp is not vectorised. It can only handle one srting at a time."
According to the regexp documentation the first input may be "specified as a character vector, a cell array of character vectors, or a string array. Each character vector in a cell array, or each string in a string array, can be of any length and contain any characters." I often use regexp with cell arrays containing multiple different char vectors, I don't see any reason why it should not work with string arrays, just as its documentation states it does.
>> C = {'UTC,2140991,49.0','UTC,2140992,49.1','UTC,2140993,49.1'}
>> regexp(C,'\d*\.\d*', 'match','once')
ans =
'49.0'
'49.1'
'49.1'
The regexp call threw an error with the column vector when I tried it. It works with a row vector, and OP may not want to re-format the column vector to a row vector. Also, while arrayfun definitely has its uses, when I’ve used it for problems like this, it’s been significantly slower than a simple loop, which surprised me. Thus, the loop.
Awesome, thank you so much Star Strider and Stephen Cobeldick! This works brilliantly for my temperature data.
Is there a way to write a similar regexp function that would isolate the number from the end of the line, regardless of whether or not it contains a decimal point? (Which is why I originally tried to use commas as delimiters.)
I also need the same function to clean up my Humidity data, which has whole integer values.
For example:
weather_strings =
10×1 string array
"UTC,2140991,59"
"UTC,2140992,61"
"UTC,2140993,60"
...
If the user selects Humidity data instead of Temperature data right now, I get the following error message:
Unable to perform assignment because the indices on the left side are not compatible with the size of the right
side.
Error in clean_data (line 14)
clean_weather_strings(:,k) = regexp(weather_strings{k}, '\d*\.\d*', 'match');
Error in Lab7 (line 23)
clean_weather_doubles = clean_data(weather_data_table) % give input to clean_data function, save output
I assume this is because our '\d*\.\d*' expression looks for digits separated by a period. I'm just not familiar enough with the syntax of the regexp() function to know how to set it up differently.
Thanks again!
My pleasure!
I assume this is because our '\d*\.\d*' expression looks for digits separated by a period.
Correct.
‘I'm just not familiar enough with the syntax of the regexp() function to know how to set it up differently.
The regexp funciton can act ‘logically’, so giving it a choice as to use the '\d*\.\d*' or '\d*', it will choose the correct pattern, with ‘|’ designationg a logical ‘or’.
To accomodate both, the regexp call changes to:
for k1 = 1:size(weather_strings,1)
Col3(k1,:) = str2double(regexp(weather_strings{k1}, '\d*\.\d*|\d*', 'match'));
end
Out = Col3(:,2)
This works for both when I tested it, amazingly enough!
(I still have much to learn about regexp myself.)
Another easy solution: '\d+\.?\d*'

Iniciar sesión para comentar.

Más respuestas (1)

Andrei Bobrov
Andrei Bobrov el 28 de Nov. de 2018
Editada: Andrei Bobrov el 28 de Nov. de 2018
In R2016b:
>> weather_strings = string({'UTC,2140991,49.0'
'UTC,2140992,49.1'
'UTC,2140993,49.1'})
weather_strings =
3x1 string array
"UTC,2140991,49.0"
"UTC,2140992,49.1"
"UTC,2140993,49.1"
>> str2double(regexp(C,'(\d+\.)?\d+$','match','once'))
ans =
49
49.1
49.1
>>

Categorías

Productos

Versión

R2018b

Preguntada:

el 26 de Nov. de 2018

Editada:

el 28 de Nov. de 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by