- "I am using xlsread"   However, the code doesn't show that you use xlsread. Why is that?
- How should I interpret   "raw 528439x20 1268253582 cell" ?
- What does   whos raw   show?
Insufficient storage (activex) - unable to complete operation - pulling in excel data more quickly.
    4 visualizaciones (últimos 30 días)
  
       Mostrar comentarios más antiguos
    
Hi All,
I have some reasonably large excel files to process (200-800mb). After a bit of dabbling and on review of the numerous threads here on the forum I have elected to use activex controls etc.
For speed of execution I am using xlsread (as opposed to csvread / importdata). Textscan is less than ideal due the variation between sheets (i.e. requires a format specifier).
To make things easier I have been reading from the excel sheets in bite sized chunks to 1) help speed up the read and 2) to avoid (other) memory errors.
Presently I am facing a challenges with the read call, that is; eRange = get(e.Activesheet,'Range',sheetR).
Currently the script loops through a 'block' of 20 columns or so reading in all rows. On the first call
that is:
eRange = get(e.Activesheet,'Range',sheetR).
raw = eRange.Value;
the program executes as expected. However looping through on the succeeding call and loading the next block I receive the error
"Error: Not enough storage is available to complete this operation."
Below is a brief outline of the script and corresponding output.
-------------- Script ------------------
% Uses activeX controls. 
% pathAndFilename = 'C:\Users\owner\Desktop\Convert GS logs\temp.xlsx'
%%Open excel application
disp('Opening Excel application...')
e=actxserver('excel.application');
wb=e.Workbooks;
w=Open(wb, pathAndFilename');
%%Find range / break range of sheet into managable chunks
% Define excel column headers (NOT r1c1 reference style..)
Alphabet=('A':'Z').'; 
[I,J] = meshgrid(1:26,1:26);
XX=[Alphabet(I(:)), Alphabet(J(:))];
XX=cellstr(strvcat(Alphabet,XX));
clear I J Alphabet
% Range
disp('Extracting range from excel sheet...')
RangeInString=e.ActiveSheet.UsedRange.Address;
range = strsplit(RangeInString,'$');
nrow = str2double(range(end));
ncol = find(strcmp(range(end-1),XX));
% Readuce the scale of the succeeding reads (Time / memory overheads). 
if ncol > 20
    splt = ceil(ncol/lump); 
    bite = floor(ncol / splt);
    columns = [1:bite:ncol];
    surp_col = ncol - columns(end);
    columns(end) = columns(end) + surp_col;
end
clear splt surp_col bite range
%%Do some work
% loop through the sheet and extract the respective chunks
for i=1:length(columns)-1
    disp(['processing columns ',num2str(columns(i)), ' through ' num2str(columns(i+1)),' of ',num2str(ncol), ' columns...'])
    tic
    % set the range for the current read / chunk
    sheetR = [XX{columns(i),:},'1:',XX{columns(i+1),:},num2str(nrow)];
    %
    % eRange = get(e.Activesheet,'Range','A1:B2');
    eRange = get(e.Activesheet,'Range',sheetR);
    raw = eRange.Value;
    clear raw sheetR
end
What I wish to achieve is to read the full xlsx file and place the data in a .mat. To reduce the overhead, on each loop the data 'raw' is written to a .mat using the -7.3 format. All non essential variable are removed using a 'clearvars -except' call.
The scrip runs fine of the first 'chunk', however I receive an error on the following iteration. What is odd to me is that each 'chunk' pulled in on the read call is precisely the same size (number of rows / columns).
The output of the script is as follows:
Loading log file: 20160719_0945.xlsx ...
Opening Excel application...
Extracting range from excel sheet...
processing columns 1 through 20 of 152 columns...
processing columns 20 through 39 of 152 columns...
Error: Not enough storage is available to complete this operation.
The script fails on the 'raw = eRange.Value' call.
A 'whos' on the preceding line (prior to the 'raw = eRange.Value') suggests little memory overhead.
    Name                   Size            Bytes  Class                                             Attributes
    MatObj                 1x1               112  matlab.io.MatFile                                           
    MetaData               1x1              3069  struct                                                      
    RangeInString          1x15               30  char                                                        
    UAVnav_Log_ID          1x1               138  cell                                                        
    XX                   702x1             81380  cell                                                        
    columns                1x8                64  double                                                      
    e                      1x1                    COM.excel_application                                       
    eRange                 1x1                    Interface.00020846_0000_0000_C000_000000000046              
    filename               1x18               36  char                                                        
    i                      1x1                 8  double                                                      
    lump                   1x1                 8  double                                                      
    matf                   1x13               26  char                                                        
    ncol                   1x1                 8  double                                                      
    nrow                   1x1                 8  double                                                      
    pathAndFilename        1x71              142  char                                                        
    w                      1x1                    Interface.000208DA_0000_0000_C000_000000000046              
    wb                     1x1                    Interface.000208DB_0000_0000_C000_000000000046
I am presently running the script on a 64bit machine with 8Gb of RAM. However, I would very much like to deploy this / similar scripts in the future on a smaller machine for those times when we are 'working in the field'. Any pointers would be greatly appreciated. A simple solution is to significantly reduce the number of columns read during any one call - however, surely there is a better way?
The example file I am running has 530,000 rows (530,000 rows x 158 columns) and generates the cell array;
raw 528439x20 1268253582 cell
2 comentarios
  per isakson
      
      
 el 6 de Ag. de 2016
				
      Editada: per isakson
      
      
 el 10 de Ag. de 2016
  
			Respuestas (0)
Ver también
Categorías
				Más información sobre ActiveX en Help Center y File Exchange.
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

