Frequency words for each labels

4 visualizaciones (últimos 30 días)
Rachele Franceschini
Rachele Franceschini el 7 de Jul. de 2022
Comentada: Rachele Franceschini el 7 de Jul. de 2022
I have one dataset with two columns: text and data. The data is made up two labels 0 and 1. I would like to calculate the frequency of each word for each labels. I mean, how many time, for example "damage" there is within class 1 and 0? How can I do? Furthermore, I don't understand if I have to, however, use tokens or no. Maybe I can use a cicle for? I don't know it.
Here there is a little image with a similar result. I would like a similar table.

Respuesta aceptada

Karim
Karim el 7 de Jul. de 2022
Editada: Karim el 7 de Jul. de 2022
Edit to make so that the code works with the latter added example data...
% read the file
data = readtable("dati_classificati.xlsx",'TextType','string');
% split each sentence into words, assuming that spaces are used as delimiter...
cell_text = arrayfun(@(x) data.text(x,:),1:size(data.text,1),'UniformOutput',false)';
cell_text = cellfun(@(x) split(x,' '), cell_text,'UniformOutput',false);
% count the number of words in each sentence
numWords = cellfun(@numel, cell_text);
% expand the labels to match the number of words for each sentence
expandedLabels = repelem( data.label ,numWords);
% gather the words in 1 big string array
expandedWords = vertcat(cell_text{:});
% list a few words to count the frequency...
MyWords = ["strada" "il" "Via" "donne" "della"];
% allocate a table for the results
varTypes = ["string","double","double"]; % data type for each column
varNames = ["Words","Ones","Zeros"]; % variable name for each column
MyResult = table('Size',[numel(MyWords) 3],'VariableTypes',varTypes,'VariableNames',varNames);
MyResult.Words = MyWords(:);
% count the labels for each word
for i = 1:numel(MyWords)
currLabels = expandedLabels( contains(expandedWords,MyResult.Words(i)) );
MyResult.Ones(i) = sum(currLabels==1);
MyResult.Zeros(i) = sum(currLabels==0);
end
% display the results
MyResult
MyResult = 5×3 table
Words Ones Zeros ________ ____ _____ "strada" 48 1 "il" 34 20 "Via" 53 0 "donne" 0 2 "della" 3 14
  9 comentarios
Karim
Karim el 7 de Jul. de 2022
I modified the original answer accoring to the file you provided, see at the top. Note that i just used the raw text and only included a few words. But normally now you see how the concept works.
Rachele Franceschini
Rachele Franceschini el 7 de Jul. de 2022
VERY VERY thank you!!!!Thank you so much!!I tried also with pre-process and it is ok!

Iniciar sesión para comentar.

Más respuestas (0)

Categorías

Más información sobre Timetables en Help Center y File Exchange.

Productos


Versión

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by