Borrar filtros
Borrar filtros

Why does Readtable return NaN for values greater than 1000?

5 visualizaciones (últimos 30 días)
Kevin
Kevin el 27 de Nov. de 2023
Comentada: Kevin el 27 de Nov. de 2023
I was working with a dataset that contains the annual GDP of all countries in billion USD. However, when importing the dataset with readtable, the corresponding table omits all data points that are greater than 1000.
fileName2 = 'GDP by Country 1999-2022.csv';
GDP = readtable(fileName2);
The following is the original CSV data (see Canada)
Whereas the table GDP that readtable produces is as below
Is there a reason for this happening?
  4 comentarios
Stephen23
Stephen23 el 27 de Nov. de 2023
"the authors used double-quotes for any number greater than 1000"
Ugh.
"Any suggestions on how to convert these strings to double while importing?"
Upload a sample data file by clicking the paperclip button.

Iniciar sesión para comentar.

Respuesta aceptada

Stephen23
Stephen23 el 27 de Nov. de 2023
Editada: Stephen23 el 27 de Nov. de 2023
Ugh, double-quoting number values really is an abuse of CSV norms. Note that is not possible to simply remove/ignore the double quotes globally (i.e. throughout the file), because
  • they are also significant in the first column of text, e.g. "Bahamas, The"
  • the thousands separator is the same as the field delimiter character.
So we need a way of removing/ignoring them only from numeric values... and to know if they are numeric requires removing them first before testing if they are numeric so that the quotes can be removed... catch 22.
Ugh! The only reason someone would go out of their way to make that file more complex to parse (i.e. by adding superfluous quotes) is if they want to make everyone else's lives harder. Avoid the person who designed that file.
One approach to importing that very unfortunate datafile would be to leverage STR2DOUBLE, which by default will accept the "," thousands delimiter. We just need to merge its output with the (already) numeric data:
C = readcell('GDP by Country 1999-2022.csv');
M = str2double(C);
X = isnan(M);
X(:,1) = false;
M(X) = [C{X}] % all data are now numeric (just ignore the first column)
M = 181×24
1.0e+04 * NaN 0.1999 0.2000 0.2001 0.2002 0.2003 0.2004 0.2005 0.2006 0.2007 0.2008 0.2009 0.2010 0.2012 0.2013 0.2014 0.2015 0.2016 0.2017 0.2018 0.2019 0.2020 0.2021 0.2022 NaN 0 0 0 0.0004 0.0005 0.0006 0.0007 0.0008 0.0010 0.0012 0.0013 0.0015 0.0019 0.0022 0.0024 0 0 0 0 0 0 0 0 NaN 0.0003 0.0004 0.0004 0.0004 0.0006 0.0007 0.0008 0.0009 0.0010 0.0011 0.0012 0.0012 0.0014 0.0015 0.0016 0.0012 0.0012 0.0013 0.0014 0.0016 0.0017 0.0017 0.0018 NaN 0.0049 0.0055 0.0055 0.0057 0.0068 0.0085 0.0102 0.0114 0.0116 0.0127 0.0135 0.0155 0.0178 0.0190 0.0203 0.0175 0.0182 0.0192 0.0202 0.0211 0.0219 0.0164 0.0168 NaN 0.0006 0.0009 0.0009 0.0011 0.0014 0.0020 0.0031 0.0044 0.0055 0.0068 0.0070 0.0088 0.0118 0.0136 0.0151 0.0102 0.0099 0.0105 0.0113 0.0119 0.0127 0.0070 0.0075 NaN 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0001 0.0002 0.0002 0.0001 0.0002 NaN 0.0284 0.0284 0.0269 0.0098 0.0128 0.0152 0.0182 0.0213 0.0247 0.0278 0.0301 0.0296 0.0316 0.0329 0.0348 0.0579 0.0579 0.0590 0.0603 0.0617 0.0634 0.0455 0.0484 NaN 0.0002 0.0002 0.0002 0.0002 0.0003 0.0004 0.0005 0.0006 0.0008 0.0009 0.0009 0.0008 0.0009 0.0010 0.0011 0.0011 0.0011 0.0011 0.0012 0.0012 0.0013 0.0014 0.0015 NaN 0.0402 0.0390 0.0368 0.0413 0.0528 0.0639 0.0712 0.0755 0.0822 0.0854 0.0920 0.0983 0.1040 0.1079 0.1126 0.1241 0.1253 0.1317 0.1382 0.1442 0.1516 0.1611 0.1677 NaN 0.0213 0.0194 0.0193 0.0209 0.0256 0.0293 0.0305 0.0322 0.0349 0.0365 0.0374 0.0389 0.0407 0.0418 0.0431 0.0373 0.0386 0.0403 0.0418 0.0434 0.0453 0.0481 0.0520
M(24,2:end) % check the first row with double-quoted numbers
ans = 1×23
1.0e+03 * 0.5869 0.6443 0.5544 0.5057 0.5522 0.6636 0.8820 1.0677 1.1777 1.2516 1.4815 1.7243 1.9243 2.0362 2.1557 1.7996 1.6729 1.7593 1.8547 1.9537 2.0544 1.6458 1.8106
And if you really want it in a table:
T = array2table(M(2:end,2:end), 'VariableNames',string(M(1,2:end)), 'RowNames',C(2:end,1))
T = 180×23 table
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ Afghanistan 0 0 0 4.084 4.585 5.971 7.309 8.399 9.892 11.513 13.318 15.406 19.248 21.555 24.304 0 0 0 0 0 0 0 0 Albania 3.444 3.695 4.096 4.456 5.6 7.452 8.376 9.133 10.163 11.131 11.726 11.797 13.808 14.91 16.053 11.591 12.204 13.214 14.341 15.553 16.996 16.77 18.012 Algeria 48.845 54.749 55.181 57.053 68.013 85.016 102.38 114.32 116.16 126.89 134.8 154.84 177.83 190.43 203.45 175.08 181.71 192.26 202.18 210.91 219.16 163.81 168.19 Angola 6.153 9.135 8.936 11.386 13.956 19.8 30.632 43.759 55.37 67.608 69.708 87.718 118.43 136.41 151.09 102.01 98.815 105.37 112.53 119.4 127.15 70.339 74.953 Antigua and Barbuda 0.652 0.678 0.71 0.718 0.754 0.818 0.875 0.962 1.026 1.074 1.18 1.196 1.322 1.404 1.494 1.285 1.328 1.386 1.458 1.536 1.617 1.405 1.534 Argentina 283.52 284.2 268.7 97.732 127.64 151.96 181.55 212.7 247.1 277.91 301.33 296.11 315.83 329.33 347.73 578.71 578.69 589.59 602.87 617.39 633.89 455.17 483.76 Armenia 1.845 1.912 2.118 2.376 2.807 3.573 4.903 6.41 7.666 8.579 8.683 8.238 9.086 9.844 10.821 10.607 10.672 11.074 11.63 12.273 12.939 13.612 15.06 Australia 401.78 390.02 368.19 412.9 527.59 639.36 712.44 754.82 822.09 854.07 920.01 982.83 1039.7 1078.9 1125.8 1240.8 1253 1317.2 1381.5 1442.5 1515.7 1610.6 1677.5 Austria 213.39 194.41 193.34 208.57 255.84 293.19 305.34 321.93 348.66 364.58 374.42 389 406.52 417.77 430.61 372.61 386.23 402.79 418.4 433.75 452.66 481.21 520.34 Azerbaijan 4.581 5.273 5.708 6.236 7.276 8.682 12.561 19.817 28.699 40.229 42.505 51.916 58.795 62.867 65.776 63.983 63.602 70.927 74.266 83.996 87.276 52.645 54.725 Bahamas, The 4.704 5.004 5.131 5.389 5.503 5.661 5.87 6.223 6.624 7.026 7.403 7.409 7.786 8.02 8.287 0 0 0 0 0 0 0 0 Bahrain 6.617 7.966 7.927 8.446 9.734 11.18 13.378 16.065 17.505 19.673 19.361 21.567 24.231 25.814 27.165 30.914 32.335 34.318 36.334 38.446 40.522 39.104 41.057 Bangladesh 46.529 47.048 47.194 49.56 54.476 59.12 61.28 65.216 71.443 77.943 92.121 99.088 114.72 123.45 133.18 202.33 218.99 238.72 260.87 285.33 310.86 355.69 390.61 Barbados 2.478 2.559 2.554 2.476 2.695 2.817 3.061 3.386 3.751 3.922 3.595 3.63 4.03 4.23 4.445 4.451 4.537 4.652 4.805 4.999 5.222 4.648 5.142 Belarus 12.104 12.758 12.421 14.654 17.823 23.141 30.131 36.944 38.185 37.323 49.043 53.137 69.146 79.416 91.501 62.02 66.37 70.997 76.023 82.437 89.315 65.754 70.632 Belgium 253.89 232.93 231.93 252.66 310.51 359.15 372.73 393.59 423.54 441.12 461.49 480.15 506 520.03 535.08 458.65 473.52 492.06 511.49 531.19 556.2 581.85 619.16
T{'Brazil',:} % check the first row with double-quoted numbers
ans = 1×23
1.0e+03 * 0.5869 0.6443 0.5544 0.5057 0.5522 0.6636 0.8820 1.0677 1.1777 1.2516 1.4815 1.7243 1.9243 2.0362 2.1557 1.7996 1.6729 1.7593 1.8547 1.9537 2.0544 1.6458 1.8106

Más respuestas (1)

Walter Roberson
Walter Roberson el 27 de Nov. de 2023
GDP = readtable(fileName2, "ThousandsSeparator",",");

Categorías

Más información sobre Logical en Help Center y File Exchange.

Etiquetas

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by