Finding unique set from large dataset

Hello!
How would one go about to find a unique set of variables that can depict shortly which variables are used in a large dataset using a matlab code?
For example if the english dictionary is my large dataset, I want the output to be the 26 alphabets-saying these are the unique variables used in your large dataset.
Another example
If x = {"abc", "bcd", "ded"}
I want the output as {"a","b","c","d","e"}
Thanks in advance!

 Respuesta aceptada

madhan ravi
madhan ravi el 18 de Jul. de 2019
a = cellfun(@(z)regexp(z,'.','match'),x,'un',0);
Output = num2cell(unique([a{:}]))

5 comentarios

KALYAN ACHARJYA
KALYAN ACHARJYA el 18 de Jul. de 2019
Editada: KALYAN ACHARJYA el 18 de Jul. de 2019
+1
@Madhan Is this also we can yes?
x ={"abc","bcd","ded"};
s=convertStringsToChars(strcat(x{:}))
data=num2cell(s);
%Next remove all duplicate cell elemnts
madhan ravi
madhan ravi el 18 de Jul. de 2019
Editada: madhan ravi el 18 de Jul. de 2019
Thanks Kalyan :)
Perhaps:
z = regexp(strcat(x{:}),'.','match');
Output = num2cell(unique(z))
edit:
Kalyan your method is in fact better , didn't verify it properly before.
Yes, Kalyan, your approach is actually better. Using a regular expression to extract individual characters is a bit overkill.
A few notes:
The OP shows a cell array of scalar string arrays. That's extremely innefficient and pointless. It should be either a string array:
x =["abc","bcd","dedf"]; %a string array
or a cell array of char vectors:
x = {'abc', 'bcd', 'dedf'}; %cell array of char vector (cellstr)
The string array will use less memory.
If starting with a string array, it can be simply:
x =["abc","bcd","dedf"]; %a string array
unique(char(join(x, '')));
If starting with a cell array of char vector:
x = {'abc', 'bcd', 'dedf'};
unique([x{:}])
madhan ravi
madhan ravi el 18 de Jul. de 2019
Editada: madhan ravi el 18 de Jul. de 2019
Thank you Guillaume :)
Sanjana Sankar
Sanjana Sankar el 19 de Jul. de 2019
Thank you all. I was looking for the output from Guillaume's method. Thanks a lot!!

Iniciar sesión para comentar.

Más respuestas (2)

Bruno Luong
Bruno Luong el 18 de Jul. de 2019
Editada: Bruno Luong el 18 de Jul. de 2019
x = ["abc", "bcd", "ded"] % no need using curly bracket for strings
string(unique(cat(2,x{:}))')'

Etiquetas

Preguntada:

el 18 de Jul. de 2019

Comentada:

el 19 de Jul. de 2019

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by