Read parquet file error
    11 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
    Christian
 el 17 de Ag. de 2021
  
    
    
    
    
    Comentada: Yongjian Feng
    
 el 3 de Nov. de 2021
            Hi,
I'm reading parquet files and facing some problems. For comparison the file was read with python using fastparquet with no errors.
The file contains 74 columns and already the output of parquetinfo shows that there is an issue with some of the columns, since the length of the variables doesn't match:
               FileSize: 66748042
           NumRowGroups: 1
        RowGroupHeights: 233382
          VariableNames: [1×31 string]
          VariableTypes: [1×31 string]
    VariableCompression: [1×74 string]
When reading the data with Matlab the following error occurs:
tbl = parquetread(fname)
Error using matlab.io.parquet.internal.makeParquetException>makeUnsupportedParquetTypeException (line 26)
To assign to or create a variable in a table, the number of rows must match the height of the table.
Error in matlab.io.parquet.internal.makeParquetException (line 10)
            e = makeUnsupportedParquetTypeException(e, filename);
Error in parquetread (line 128)
    e = makeParquetException(e, filename); 
I've tried to access the data via the column name directly using the column names provided by parquetinfo. This works for some variables but for other variables it states that the variable is not a subset of the variablenames:
tbl = parquetread(fname ,'SelectedVariableNames','variableX');
Error using parquetread (line 124)
'SelectedVariableNames' value must be a unique subset of 'variable1, ....'
Using the list of variables in this error message delivers again the first error message
tbl = parquetread(fname ,'SelectedVariableNames','variable1');
me = 
  MException with properties:
    identifier: 'MATLAB:table:RowDimensionMismatch'
       message: 'To assign to or create a variable in a table, the number of rows must match the height of the table.'
         cause: {}
         stack: [3×1 struct]
    Correction: []
The I tried to access the data via the column names coming from python. However, this gievs the error that the variable is not part of the variable list (see second error above)
In Python some of the column names contain a period like "name.name1". Can this be an issue in Matlab?
Any ideas on that?
Thanks a lot.
Christian
0 comentarios
Respuesta aceptada
  Yongjian Feng
    
 el 18 de Ag. de 2021
        matlab can call python script. Use python to read and then pass the data to matlab. Or use python to convert the data into CSV or json first and then use matlab to read.
3 comentarios
  tim che
 el 3 de Nov. de 2021
				could you please tell how to read file using python? I have this problem 

  Yongjian Feng
    
 el 3 de Nov. de 2021
				This looks like error shown in matlab, right? If so, maybe you need to make sure the python script can run successfully outside matlab first. Then figure out how to run it from matlab.
Más respuestas (0)
Ver también
Categorías
				Más información sobre Call Python from MATLAB en Help Center y File Exchange.
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


