MATLAB Answers

Replace a missing string in a table

50 views (last 30 days)
Ory
Ory on 28 Sep 2016
Answered: Peter Perkins on 3 Oct 2016
I want to replace all missing strings in a table with a string of my choice, say 'unknown'. I use R2016a (without an upgrade option), so functions like fillmissing are not available to me, in case they could be of help. Eg:
dblVar = [NaN; 3; 7; 9];
cellstrVar = {'one'; 'three'; ''; 'nine'};
categoryVar = categorical({''; 'red'; 'yellow'; 'blue'});
A = table(dblVar, cellstrVar, categoryVar)
A =
dblVar cellstrVar categoryVar
______ __________ ___________
NaN 'one' <undefined>
3 'three' red
7 '' yellow
9 'nine' blue
I would like to end up with this:
A =
dblVar cellstrVar categoryVar
______ __________ ___________
NaN 'one' unknown
3 'three' red
7 'unknown' yellow
9 'nine' blue
Note I also replaced the categorical '<undefined>' as well, if you can please include in your answer.
Is there a way to do this without changing A's structure, eg from table to cell, in the process? The reason I want to avoid the transformation is my table is large, and transformation may cause memory issues.
Edit to add: the location of the missing string value has to be identified as well, there may be several such columns in the table.
Many thanks.

  0 Comments

Sign in to comment.

Accepted Answer

George
George on 28 Sep 2016
This should work:
dblVar = [NaN; 3; 7; 9];
cellstrVar = {'one'; 'three'; ''; 'nine'};
categoryVar = categorical({''; 'red'; 'yellow'; 'blue'});
cellstrVar2 = {'four'; 'none'; '7'; ''};
A = table(dblVar, cellstrVar, categoryVar, cellstrVar2);
varNames = A.Properties.VariableNames;
for ii = 1:numel(varNames)
if iscellstr(A{1,varNames{ii}})
undefloc = strcmp(A.(ii), '');
A{undefloc, ii} = cellstr('unknown');
end
if iscategorical(A{1, varNames{ii}})
undefloc = isundefined(A{:,ii});
A{undefloc, ii} = categorical(cellstr('unknown'));
end
end
A =
dblVar cellstrVar categoryVar cellstrVar2
______ __________ ___________ ___________
NaN 'one' unknown 'four'
3 'three' red 'none'
7 'unknown' yellow '7'
9 'nine' blue 'unknown'
You can use the undefloc variable to find where things were undefined or empty strings.

  4 Comments

Show 1 older comment
George
George on 29 Sep 2016
If you know what categories are going to be cell / categorical arrays you don't need to use the loop.
dblVar = [NaN; 3; 7; 9];
cellstrVar = {'one'; 'three'; ''; 'nine'};
categoryVar = categorical({''; 'red'; 'yellow'; 'blue'});
cellstrVar2 = {'four'; 'none'; '7'; ''};
A = table(dblVar, cellstrVar, categoryVar, cellstrVar2);
undefloc = strcmp(A(:, {cellstrVar, cellstrVar2}), '');
A{undefloc, {cellstrVar, cellstrVar2}} = cellstr('unknown');
undefloc = isundefined(A{:,categoryVar});
A{undefloc, categoryVar} = categorical(cellstr('unknown'));
Ory
Ory on 29 Sep 2016
Awesome! If not known, one can get to the classes index by, say for cell strings:
catInd = strcmp(varfun(@class, A, 'OutputFormat', 'cell'), 'cell');
George
George on 29 Sep 2016
There you go. If you do that and slam the cells and categoricals into cell arrays you can use the curly brace syntax on the cell array, rather than the variable name.

Sign in to comment.

More Answers (1)

Peter Perkins
Peter Perkins on 3 Oct 2016
George's loop seems fine to me although you could tweak it a bit as
for name = varNames
var = A.name;
if iscellstr(var)
var(strcmp(var,'')) = {'unknown'};
elseif iscategorical(A.(name))
var(isundefined(var) = 'unknown';
end
A.Name = var;
end
If you're willing to write a couple of small functions, you can do this:
theCellStrs = varfun(@iscellstr,A);
A(:,theCellStrs) = varfun(@replaceEmptyString,A(:,theCellStrs));
function c = replaceEmptyString(c)
c(strcmp(c,'') = {'Unknown'};
(and similarly for categorical) but varfun uses a loop underneath.

  0 Comments

Sign in to comment.

Sign in to answer this question.


Translated by