unique is giving the same expression twice

Question

Wesso el 29 de En. de 2021

0
Enlazar

Enlace directo a esta pregunta

https://es.mathworks.com/matlabcentral/answers/730793-unique-is-giving-the-same-expression-twice

Editada: dpb el 29 de En. de 2021

Respuesta aceptada: dpb

matlab.zip

Hi,

(data is attached)

[Country,~,ix] = unique(A);

tally = accumarray(ix, 1);

Q2= table(Country, tally);

Q2 contains the same expression twice for the unique values of 'Audit and assurance, and tax services'. what could be the reason? and how to overcome it? is it a bug?

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Steven Lord el 29 de En. de 2021

They may look the same, but can you prove they're stored the same? Store two of the expressions that look identical in separate variables x and y then run the following code and show us the results.

disp(x)
disp(y)
isequal(x, y)
whos x y
x==y % only if x and y are the same size

dpb el 29 de En. de 2021

Editada: dpb el 29 de En. de 2021

This undoubtedly is the same issue I pointed out before at https://www.mathworks.com/matlabcentral/answers/730643-replacing-999-in-a-table-to-nan-regardless-of-the-type-of-the-column?s_tid=srchtitle#comment_1294958 where the encoding is different. Thus the strings visually appear the same, but one contains a double-byte character and the other doesn't.

Here's the specifics to show what was there for that particular set of values I looked at; undoubtedly you'll find the same thing here if you look carefully...

>> sort(categories(Final.org04b))
ans =
  46×1 cell array
    {'-999'                                     }
    {'-9999'                                    }
...
    {'I don't know'                             }
    {'I don’t know'                             }
...
>> tmp=ans(42:43)
tmp =
  2×1 cell array
    {'I don't know'}
    {'I don’t know'}
>> strcmp(tmp(1),tmp(2))
ans =
  logical
   0
>> [double(tmp{1});double(tmp{2})]
ans =
   73    32    100    111    110     39    116   32   107    110    111   119
   73    32    100    111    110   8217    116   32   107    110    111   119
>> 

NB: the extended character "8217" in the second instead of the ASCII 39 for the single quote.

Iniciar sesión para comentar.

Iniciar sesión para responder a esta pregunta.

Answer 1

dpb el 29 de En. de 2021

3
Enlazar

Enlace directo a esta respuesta

https://es.mathworks.com/matlabcentral/answers/730793-unique-is-giving-the-same-expression-twice#answer_609873

Editada: dpb el 29 de En. de 2021

I didn't notice the data attached for this case -- the same exercise as above shows:

>> sort(categories(A))
ans =
  29×1 cell array
    {'Agriculture and fishing'                              }
    {'Audit and assurance, and tax services'                }
    {'Audit and assurance, and tax services'                }
    {'Banking and capital markets'                          }
    {'Civil Societies/NGOs'                                 }
    {'Civil society/NGOs'                                   }
    {'Construction'                                         }
    {'Consulting services'                                  }
    {'Education and academia'                               }
    {'Electronics'                                          }
    {'Energy, utilities and resources'                      }
    {'Financial services'                                   }
    {'Food Services'                                        }
    {'Government and public services'                       }
    {'Health and healthcare services'                       }
    {'Hospitality'                                          }
    {'IT and telecommunications'                            }
    {'Manufacturing'                                        }
    {'Mining and Quarrying'                                 }
    {'Oil and gas'                                          }
    {'Other'                                                }
    {'Other business services'                              }
    {'Other business services, please specify: ____________'}
    {'Petrochemicals'                                       }
    {'Real Estate'                                          }
    {'Tourism'                                              }
    {'Transportation and logistics'                         }
    {'Wholesale and retail trade'                           }
    {'org03'                                                }
>> tmp=ans(2:3)
tmp =
  2×1 cell array
    {'Audit and assurance, and tax services'}
    {'Audit and assurance, and tax services'}
>> 

There's an extended character (=160) in the second where there's an ordinary space in the first:

>> find(tmp{1}~=tmp{2})
ans =
    25
>> [double(tmp{1}(25));double(tmp{2}(25))]
ans =
    32
   160
>> 

Besides that, there are other anomolous entries as well just as were pointed out in the other categorical array in the previous Q?

...
{'Civil Societies/NGOs'                                 }
{'Civil society/NGOs'                                   }
...
{'Other business services'                              }
{'Other business services, please specify: ____________'}
...

that need to be cleaned up or one will never be able to match all elements of what are obviously intended to be the same categories but are not.

The data need a throrough cleaning before being ready for prime time.

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Iniciar sesión para comentar.

unique is giving the same expression twice

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Respuesta aceptada

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

unique is giving the same expression twice

4 comentarios Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

Respuesta aceptada

0 comentarios Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos

Más respuestas (0)

Ver también

Categorías

Etiquetas

Community Treasure Hunt

4 comentarios
Mostrar 2 comentarios más antiguosOcultar 2 comentarios más antiguos

0 comentarios
Mostrar -2 comentarios más antiguosOcultar -2 comentarios más antiguos