Why does 'readtable' generate NaN values for text data imported from a file in R2020a or later?
243 visualizaciones (últimos 30 días)
Mostrar comentarios más antiguos
MathWorks Support Team
el 23 de Nov. de 2021
Editada: MathWorks Support Team
el 23 de Nov. de 2021
I'm reading in an Excel spreadsheet file that has columns containing numerical and text data.
Depending on the order of the values, 'readtable' appears to choose different data types.
The data in the spreadsheet is:
COL_1 | COL_2
TEXT_1 | 2
TEXT_2 | TEXT
-1 | 2
'readtable' returns the following data:
>> readtable('MixedInputs.xlsx')
ans =
3×2 table
COL_1 COL_2
________ _____
{'TEXT_1'} 2
{'TEXT_2'} NaN
{'-1' } 2
COL_1 imports all data as character vectors.
COL_2 has broken data 'NaN' where "TEXT" should be read.
Is this expected behavior? I was expecting 'readtable' to revert all data to character vectors or strings, for any column containing mixed data types.
Respuesta aceptada
MathWorks Support Team
el 23 de Nov. de 2021
Editada: MathWorks Support Team
el 23 de Nov. de 2021
This is the expected behavior in R2020a or later. The 'readtable' function determined that the data type appropriate for the values detected in the second column is a numeric array. However "TEXT" is not numeric. Then, 'readtable' fills in the gaps with appropriate missing values — a 'NaN' value for numeric variables, and an empty character vector for text.
Option 1:
To get a table with character vector columns or string columns, use the 'setvartype' file import option:
filename = 'MixedInputs.xlsx';
opts = detectImportOptions(filename);
opts = setvartype(opts,'char'); % or 'string'
T = readtable(filename,opts)
T =
3×2 table
COL_1 COL_2
________ ________
{'TEXT_1'} {'2' }
{'TEXT_2'} {'TEXT'}
{'-1' } {'2' }
Refer to the following documentation page for more information regarding 'setvartype':
Option 2:
If your table data has text headers, as is the case here, you can restore the default behavior from previous releases by specifying the ('Format','auto') name-value pair argument. 'readtable' then reads the headers as data, and as a result, it converts all the table variables to text:
>> T = readtable('MixedInputs.xlsx','Format','auto')
T =
3×2 table
COL_1 COL_2
________ ________
{'TEXT_1'} {'2' }
{'TEXT_2'} {'TEXT'}
{'-1' } {'2' }
Refer to the following documentation page for more information regarding name-value pair arguments for text files:
Option 3:
If there are strings and numbers mixed in a single column, then 'readtable' may not be the right function. 'readcell' might be better to represent the data as mixed cell array:
>> C = readcell('MixedInputs.xlsx')
C =
4×2 cell array
{'COL_1'} {'COL_2'}
{'TEXT_1' } {[ 2]}
{'TEXT_2' } {'TEXT' }
{[ -1]} {[ 2]}
Refer to the following documentation page for more information about 'readcell':
0 comentarios
Más respuestas (0)
Ver también
Categorías
Más información sobre Spreadsheets en Help Center y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!