Info

# My nested For Loops takes 40 seconds for 20,000 records. Anyway to vectorize or improve?

1 view (last 30 days)
Mark McGrath on 15 Jun 2020
Closed: MATLAB Answer Bot on 20 Aug 2021
% Consolidate Results by Security and Date
uniqueDate = unique(d.RiskData(:,2));
uniqueSec = unique(d.RiskData(:,1));
numSecs = length(unique(d.RiskData(:,1)));
numDates = length(unique(d.RiskData(:,2)));
d.RiskDataNew{(numDates * numSecs) + 1,size(d.RiskData,2)} =[];
d.RiskDataNew(1,1) = {'SECURITY'};
d.RiskDataNew(1,2) = {'DATE'};
for i = 1:numFields
d.RiskDataNew(1,i + 2) = fldList(i);
end
%d.RiskDataNew = {};
tic
n = 1;
for x = 1:numSecs
for y = 1:numDates
idx = find(contains(d.RiskData(1:end,1),uniqueSec(x)) & ...
contains(d.RiskData(1:end,2),uniqueDate(y)));
if ~isempty(idx)
n = n + 1;
d.RiskDataNew(n,:)=d.RiskData(idx(1),:);
if length(idx) > 1
numFields = size(d.RiskData,2);
for j = 2:length(idx)
for k = 3:numFields % first two fields are defaults: Security and Date
if (isempty(d.RiskDataNew{n,k}))
d.RiskDataNew{n,k} = d.RiskData{idx(j),k};
end
end
end
end
end
end
end
toc
##### 2 CommentsShowHide 1 older comment
Walter Roberson on 16 Jun 2020
I have not tested, but I have a suspicion that ismember is faster than contains().
contains(d.RiskData(1:end,1),uniqueSec(x))
That sub-expression can be calculated at the for x level. You can possibly even do
for x = 1:numSecs
xidx = find(contains(d.RiskData(1:end,1),uniqueSec(x)));
for y = xidx
idx = find(contains(d.RiskData(1:end,2),uniqueDate(y)));

Sayyed Ahmad on 16 Jun 2020
You have to avoiding the nested loops.
may be you can use the bsxfun to avoiding some loops.
An example:
A = rand(50); % 50-by-50 matrix of random values between 0 and 1
% method 1: slow and lots of lines of code
tic
meanA = mean(A); % mean of every matrix column: a row vector
% pre-allocate result for speed, remove this for even worse performance
result = zeros(size(A));
for j = 1:size(A,1)
result(j,:) = A(j,:) - meanA;
end
toc
clear result % make sure method 2 creates its own result
% method 2: fast and only one line of code
tic
result = bsxfun(@minus,A,mean(A));
toc
Elapsed time is 0.015153 seconds.
Elapsed time is 0.007884 seconds.
see the following links for more details.

Mark McGrath on 16 Jun 2020
Edited: Walter Roberson on 16 Jun 2020
Hello Thanks for soem of the responses.
I am trying to consolidate data and can use a table with summary grouping but i need it to be dynamic as i do not know all the fields ahead of time other than security and date (so can't use). So i am iterating through to find all fields, distinct securities and distinct dates. For security and date i want all fields to be on one row.
The example below happens to show security and date with 3 extra fields. I want the three extra fields to be rolled up into one line. BUT, those 3 extra fields still need to be dynamic as i may have 20 extra fields. Depends on scenario.
testData = {'IBM','01/04/2020',[],'23',1;
'IBM','01/06/2020',[],'20',2;
'IBM','01/08/2020',[],'25',3;
'IBM','01/12/2020',[],'26.23',4;
'IBM','01/20/2020',[],'28.21',5;
'IBM','01/22/2020',[],'56',6;
'IBM','01/04/2020','0.1475',[],7;
'IBM','01/06/2020','0.56',[],8;
'IBM','01/08/2020','0.785',[],9;
'IBM','01/12/2020','0.654',[],10;
'IBM','01/20/2020','0.8941',[],11;
'IBM','01/21/2020','0.7841',[],12;
'IBM','01/30/2020','0.7777',[],13;
'IBM','01/31/2020','0.3333',[],14;
'CSCO','01/04/2020',[],'23',[];
'CSCO','01/06/2020',[],'20',[];
'CSCO','01/08/2020',[],'25',17;
'CSCO','01/12/2020',[],'26.23',[];
'CSCO','01/20/2020',[],'28.21',[];
'CSCO','01/22/2020',[],'56',20;
'CSCO','01/04/2020','0.1475',[],21;
'CSCO','01/06/2020','0.56',[],22;
'CSCO','01/08/2020','0.785',[],23;
'CSCO','01/12/2020','0.654',[],24;
'CSCO','01/20/2020','0.8941',[],25;
'CSCO','01/21/2020','0.7841',[],26;
'CSCO','01/30/2020','0.5555',[],27;
};
dpb on 17 Jun 2020
Are all the "fields", numeric? Then just a cell array would hold them -- the problem will still be processing a different number of elements per cell will kill about any vector operations as will a variable number of struct fields or the like.
The most straightforward way altho a little more memory-costly would be to either define a maximum N and preallocate or determine the max number in the dataset and allocate that size of array using NaN or other way to indicate the missing values.
Or, you could then use a table or timetable with the necessary number of variable columns -- code could then be written for that scenario as well that would be generic based on the table size/number variables/columns.

R2020a

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!