MATLAB Answers

replace function in tall array. This indicates an internal error. Please contact MathWorks Technical Support.

3 views (last 30 days)
Peng Li
Peng Li on 25 Mar 2020
Commented: Sean de Wolski on 31 Mar 2020
I'm trying to replace a bunch of strings with meaningful strings in each column of a tall array. In the reference for replace function, it says it fully supports tall array. But I ran into the error. See an example below:
b = ["a", "6", "1001", "0", "3"]';
b = [b; b];
code = ["a"; "1001"; "3"; "10"];
meaning = ["atest", "1001t", "3b", "102b"]';
a2 = replace(b, code, meaning); % this works fine
a = replace(tall(b), code, meaning); % this throws an error
a1 = replace(tall(b'), code, meaning); % this throws another error asking me to contact technical support
first error message:
Error using tall/replace (line 21)
Incompatible tall array arguments. The first dimension in each tall array must have the same size, or have a size of
1.
It seems it complains about the tall(b) because the first dimension is not 1. So I explicitely transposed it by tall(b'), it threw the error below:
Error using tall/replace (line 21)
The operation generated an invalid chunk for output parameter 1. The chunk has size [1 10] where the expected size is
[4 10]. This indicates an internal error. Please contact MathWorks Technical Support.
I'm using R2020a.

  2 Comments

Sean de Wolski
Sean de Wolski on 30 Mar 2020
What's your end goal? Do you want to write this back to disk with the replacements? Do you want further downstream processing? For further downstream processing, the idea is that gather will never need the entire array in memory at once. For writing, look at tall.write.
Peng Li
Peng Li on 30 Mar 2020
thanks Sean. yeah my goal is to write the tall table to disk after replacing all 500k*60k with specific meanings. Do you mean that if I gather here, it doesn't need the entire array to be in memory?
Yeah I used write to write this tall table to disk, and it then comes to my previous question actually which I think you also kindly replied lol

Sign in to comment.

Answers (2)

Jyotsna Talluri
Jyotsna Talluri on 30 Mar 2020
You have to use gather function to calulate the unevaluated tall array tall(b)
a2 = replace(b, code, meaning);
a = replace(gather(tall(b)), code, meaning);
Refer to the documentation link for more details

  1 Comment

Peng Li
Peng Li on 30 Mar 2020
Thanks Jyotsna. Unfortunately, i've a 500k*60k table. If I would gather it here, why do I bother using a tall table?

Sign in to comment.


Sean de Wolski
Sean de Wolski on 30 Mar 2020
Edited: Sean de Wolski on 30 Mar 2020
At the very least it's a doc bug because the doc says that tall arrays are fully supported for replace. It does appear to work with scalar values for old and new but the results are not the same because the replace happens sequentially rather than in one shot.
I'd contact tech support for that.
However, using categorical and renamecats, I'm able to get the same result:
b = ["a", "6", "1001", "0", "3"]';
b = [b; b];
code = ["a"; "1001"; "3"; "10"];
meaning = ["atest", "1001t", "3b", "102b"]';
a2 = replace(b, code, meaning); % this works fine
tb = tall(categorical(b, unique([code;b])));
b2 = renamecats(tb,code, meaning);
bg = gather(b2); % DON'T Call this, just doing it on simple example to check.
assert(isequal(string(bg), a2))
write('test.csv', b2); % Change the pattern to what you want for writing

  3 Comments

Peng Li
Peng Li on 30 Mar 2020
Thanks Sean.This looks to be a great solution. And yeah I've reached out to mathworks and they give me another solution by using matlab.tall.transform.
Will try both and test. Would you mind looking through the write error in my first question as well?
Peng Li
Peng Li on 31 Mar 2020
Just tried on this. renamecats works fine as long as code is a subset of tb while it happens that my codebook may contains codes that have never been used in the actual tall table. So the cat by tb = tall(categorical(b, unique([code;b]))); becomes necessary. This looks for me a bit cubersome.
Instead, the categorical function supposed to work directly for this by categorical(b, code, meaning). However, it always throws another error as well whenever I only have one code and one meaning.
See my second question:

Sign in to comment.


Translated by