Platform: W10-64bit, 4 core, 500GB SSD, 32GB Ram
File to import description: 80MB, contains comma delimited numbers and short character arrays. Overall format 27 columns x 400K lines.
Clicking the import data tool icon prompts for a file name, shows a message "Opening a large text file ..." and then displays the file contents in a table format on the GUI after ~5 seconds. The table displayed matches the file contents.
On the same GUI I set the output type to cell array and range to A2:AA395201 (entire file minus header line). Clicking the Import Selection button displays message "Importing Data..." and a status bar that stays gray for 35 minutes before it suddenly disappears. At that point the import is complete and the variable name appears in the workspace.
Why does the initial opening large text file finish in 5 seconds but the import take 35 minutes? It seems for the opening large text file step to complete and display the data in the GUI table, it has essentially imported the data but 500X faster!

 Respuesta aceptada

Yair Altman
Yair Altman el 13 de Dic. de 2020

0 votos

The import tool GUI only shows you a preview of the data, based on the top N lines in the file, it does not read and process the entire file. Only when you click the <Import Selection> button is the entire file processed based on the selected range that you specified and the file format detected by the preview. This naturally takes much longer than the preview processing.

4 comentarios

Brad_EE
Brad_EE el 13 de Dic. de 2020
Although only the first 35 lines are displayed in the viewable portion of the preview table I can scroll to the bottom of the table and see that all 400K lines have been read. It only takes a few seconds to scroll to the bottom.
Yair Altman
Yair Altman el 13 de Dic. de 2020
Each time that you scroll, Matlab only needs to read and process a small number of lines. What you see is called "Percieved performance", it does not mean that the entire file is in fact loaded at once. The very fact that there is a small lag of a few secs each time that you scroll, tells you that this part of the file is being read and processed at this time (otherwise it would have been displayed immediately).
Brad_EE
Brad_EE el 13 de Dic. de 2020
Perhaps it is reading 35 lines at a time based on the relative position of the scroll bar. So for instance if the scroll bar is pulled down 3/4 of its length, only 35 lines from that location in the file is actually read and displayed.
Yair Altman
Yair Altman el 13 de Dic. de 2020
Yes, this is exactly what I meant

Iniciar sesión para comentar.

Más respuestas (0)

Preguntada:

el 11 de Dic. de 2020

Comentada:

el 13 de Dic. de 2020

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by