Split cell array rows by delimiter (2016b)
2 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
Hau Kit Yong
el 27 de Jun. de 2019
Comentada: Jan
el 27 de Jun. de 2019
I have a vertical cell array of char vectors that I want to split into smaller vertical cell arrays based on rows in the array that serve as delimiters. For example,
x = ...
{'LINE1'; ...
'* THIS IS A COMMENT LINE'; ...
'* THERE CAN BE MORE THAN ONE COMMENT LINE'; ...
'LINE2'; ...
'LINE3'};
should be split into
x_split = ...
{{{'LINE1'}}; ...
{'LINE2';'LINE3'}};
where lines starting with '* ' are comment identifiers.
I would like the operation to be as fast as possible so I would like a vectorized approach, perhaps involving cellfun/arrayfun. I can get the indices of the comment lines easily enough using cellfun and strncmp, but I'm not sure how to proceed with the splitting.
2 comentarios
Jan
el 27 de Jun. de 2019
You forgot to mention, why the first line is stored as a scalar cell array, while the other 2 are a cell vector. Do you want to join the char vectors by using all blocks of comments as separators?
Respuesta aceptada
Jan
el 27 de Jun. de 2019
Editada: Jan
el 27 de Jun. de 2019
Let's start with a loop approach to clarify at first, what you exactly want:
C = {'LINE1'; ...
'* THIS IS A COMMENT LINE'; ...
'* THERE CAN BE MORE THAN ONE COMMENT LINE'; ...
'LINE2'; ...
'LINE3'};
limit = [true, strncmp(C, '*', 1).', true]; % no need for the slow cellfun here!
ini = strfind(limit, [true, false]);
fin = strfind(limit, [false, true]) - 1;
n = numel(ini);
Result = cell(n, 1);
for k = 1:n
Result{k} = C(ini(k):fin(k));
end
Now you hope that a vectorized approach or cellfun is faster? I do not think so.
Maybe find(diff()) this is faster than calling strfind twice:
limit = [true, strncmp(C, '*', 1).', true]; % no need for the slow cellfun here!
index = find(diff(limit))
n = numel(index) / 2;
Result = cell(n, 1);
for k = 1:n
Result{k} = C(index(2*k-1):index(2*k)-1);
end
Well, let's try splitapply:
isComment = strncmp(C, '*', 1);
index = zeros(size(C));
index(strfind([true, isComment], [true, false])) = 1;
index = cumsum(index);
index(isComment) = NaN;
Result = splitapply(@(x) {x}, C, index);
This seems to be too complex. mat2cell is more direct:
isCmt = strncmp(C, '*', 1);
limit = [true, isCmt.', true];
ini = strfind(limit, [true, false]);
fin = strfind(limit, [false, true]) - 1;
Rexult = mat2cell(C(~isCmt), (fin - ini + 1).')
Some timings:
C = repmat(C, 10000, 1); % A larger input
% With tic/toc, Matlab 2019a ONLINE:
% STRFIND: 0.084 sec
% FIND(DIFF): 0.091 sec
% SPLITAPPLY: 0.235 sec
% MAT2CELL: 0.046 sec
The timings in the ONLINE machine need not be accurate, so test it locally again.
2 comentarios
Jan
el 27 de Jun. de 2019
I've edited the answer and added a splitapply and mat2cell appraoch, which might be considered as "vectorized".
Más respuestas (0)
Ver también
Categorías
Más información sobre Structures en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!