How to extract matches from results of a regexp match
36 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Bill Tubbs
el 8 de Jun. de 2022
I'm trying to find the columns of a table that match a pattern. This works:
col_names = {'X_est_9', 'X_est_10', 'Y_est_9', 'Y_est_10', 'E_obs_9', 'E_obs_10'};
result = regexp(col_names, 'E_obs_\d*', 'match')
But the result is a cell array of cells (not sure why):
result =
1×6 cell array
{0×0 cell} {0×0 cell} {0×0 cell} {0×0 cell} {1×1 cell} {1×1 cell}
I just want a cell array of the matched results:
matched_col_names =
1×2 cell array
{'E_obs_9'} {'E_obs_10'}
Must be an easier way than this:
matched_col_names = cellfun(@(x) x, result(~cellfun(@isempty, result)))
0 comentarios
Respuesta aceptada
Stephen23
el 8 de Jun. de 2022
Editada: Stephen23
el 19 de Jun. de 2022
"But the result is a cell array of cells (not sure why):"
Summary: you need to use the ONCE option.
Explanation: There are two things going on in your question. Firstly you used the default ALL option shown here:
which matches all occurances in the input string that match the regular expression, which could be two or more times. Because there could be multiple matches, all of the outputs are nested in cell arrays (you can see this by reading through the output descriptions, too many to copy here).
Because you only want to match the regular expression once (not multiple times), you should specify the ONCE option... this will remove one level of nested cell arrays from the output. If you are planning on using REGEXP, you will find the ONCE option very useful.
Secondly the MATCH output cell array always has the same size as the input cell array. If you provide it with a six-element cell array, then you will get a six-element cell array at the output. So your expected output size is not supported by REGEXP (and for reasons of traceability should not occur).
But you can remove the empty elements yourself, this is quite easy and much more efficient than your code:
col_names = {'X_est_9', 'X_est_10', 'Y_est_9', 'Y_est_10', 'E_obs_9', 'E_obs_10'};
result = regexp(col_names, 'E_obs_\d*', 'match', 'once')
result(cellfun('isempty',result)) = []
Bonus: You might find this tool useful when developing regular expressions:
1 comentario
the cyclist
el 8 de Jun. de 2022
Today I learned about the 'once' option (which I did not find, despite looking through the docs). But will I remember?!
:-)
Más respuestas (2)
the cyclist
el 8 de Jun. de 2022
Even when using a single character array input along with the 'match' option, MATLAB has to return outputs in a cell array, to be able to handle cases where there are multiple matches within a single input:
regexp('E_obs_9 E_obs_10','E_obs_\d*','match')
Because you are passing in a cell array of character arrays, you get out a cell array of cell arrays. You get the empty ones because MATLAB has no way of "knowing" that you don't want the empty ones. In particular, if it only output two cells, you would have no way of knowing which two input element that those two outputs corresponded to.
So, I'm afraid that you are stuck doing the post-processing step, as far as I can tell.
0 comentarios
Ver también
Categorías
Más información sobre Characters and Strings en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!