Finding mode of each row in an array of Strings

5 visualizaciones (últimos 30 días)
Manas
Manas el 12 de Ag. de 2024
Respondida: Steven Lord el 14 de Ag. de 2024
Currently I have an array with 3 columns and a lot of rows (about 50,000). Each value is a string I essentially want to compare the 3 values in a row and find the most common.
Say my input table looked like the following
Apple Bannana Apple
Cherry Cherry Apple
Mango Mango Mango
My outputs would be
Apple
Cherry
Mango
Please let me know if there is any advice, I have tried mode but it does not work for strings.

Respuesta aceptada

Naga
Naga el 12 de Ag. de 2024
Dear Manas,
I understand you have a large array with 3 columns and many rows, where each value is a string. You want to find the most common string in each row and output these values. Here’s how you can do in MATLAB.
  1. Define the sample data as a cell array.
  2. Use 'arrayfun' to apply the 'mostcommon' function to each row of the data.
  3. Output the results using disp.
% Sample data
data = {
'Apple', 'Banana', 'Apple';
'Cherry', 'Cherry', 'Apple';
'Mango', 'Mango', 'Mango'
};
% Apply the function to each row and store results
mostCommonValues = arrayfun(@(i) mostCommon(data(i,:)), 1:size(data, 1), 'UniformOutput', false);
% Display the results
disp(mostCommonValues);
{'Apple'} {'Cherry'} {'Mango'}
% Function to find the most common element in a cell array row
function commonValue = mostCommon(cellRow)
[uniqueElements, ~, idx] = unique(cellRow);
counts = accumarray(idx, 1);
[~, maxIdx] = max(counts);
commonValue = uniqueElements{maxIdx};
end
This approach should work efficiently even for large datasets like the one you mentioned with 50,000 rows.
Please refer to the below documentation to know more about the function 'arrayfun':
Hope this helps you!
  1 comentario
Manas
Manas el 14 de Ag. de 2024
This worked really well but do you know if there is anyway to make it so that I can ignore the blank cells if possible for example if it is ["Apple", "",""] it returns apple?

Iniciar sesión para comentar.

Más respuestas (2)

Steven Lord
Steven Lord el 14 de Ag. de 2024
If these strings represent data from one of several values in a category, consider storing the data as a categorical array.
str = ["Apple" "Banana" "Apple"; "Cherry" "Cherry" "Apple"; "Mango" "Mango" "Mango"];
C = categorical(str)
C = 3x3 categorical array
Apple Banana Apple Cherry Cherry Apple Mango Mango Mango
What fruits (categories) are present in C?
whichFruits = categories(C)
whichFruits = 4x1 cell array
{'Apple' } {'Banana'} {'Cherry'} {'Mango' }
Can we ask for the most common category in each row?
M = mode(C, 2)
M = 3x1 categorical array
Apple Cherry Mango
Does this work even if there's a missing value in C?
C(2, 2) = missing
C = 3x3 categorical array
Apple Banana Apple Cherry <undefined> Apple Mango Mango Mango
mode(C, 2)
ans = 3x1 categorical array
Apple Apple Mango
Now in row 2, Apple and Cherry occur equally frequently, but Apple comes first in the list of categories so it's the mode. [Apple (pi) a la mode? ;)]
Can we figure out how many elements of each category are in each row?
[counts, fruit] = histcounts(C(1, :))
counts = 1x4
2 1 0 0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
fruit = 1x4 cell array
{'Apple'} {'Banana'} {'Cherry'} {'Mango'}
or:
counts = countcats(C(1, :)) % No second output, returns counts in categories() order
counts = 1x4
2 1 0 0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

Voss
Voss el 12 de Ag. de 2024
str = ["Apple" "Banana" "Apple"; "Cherry" "Cherry" "Apple"; "Mango" "Mango" "Mango"]
str = 3x3 string array
"Apple" "Banana" "Apple" "Cherry" "Cherry" "Apple" "Mango" "Mango" "Mango"
N = size(str,1);
modes = strings(N,1);
for ii = 1:N
[~,~,idx] = unique(str(ii,:));
modes(ii) = str(ii,mode(idx));
end
disp(modes)
"Apple" "Cherry" "Mango"
  2 comentarios
Manas
Manas el 14 de Ag. de 2024
This was helpful and works but using a for loop is slightly slower therefore was a bit impractical for me thanks though.
Voss
Voss el 14 de Ag. de 2024
You're welcome!
arrayfun is also a for loop.

Iniciar sesión para comentar.

Categorías

Más información sobre Logical en Help Center y File Exchange.

Etiquetas

Productos


Versión

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by