Replace i and j for loops to make it faster

3 visualizaciones (últimos 30 días)
Amir Alao
Amir Alao el 28 de Mayo de 2021
Editada: Rupesh el 29 de Feb. de 2024
Hi, I have the following code which associates different company unique identifires ( permno), with a given date and then calculates the daily standard deviation for every company.
I'm fairly new to mathlab and managed t come up with the following code, however the double for loop makes the code extremely time consuming ( I let it run overnight and it's still not done) due to the size of the original fataset.
I have been advised to look into using meshgrid but i'm not sure of how it works.. Does anyone know how I could make this code go faster ?
end
pp=unique(data_crsp(:,c.permno)); %This collects the unique identifiers ( permno ) within the original dataset
dd=unique(data_crsp(:,c.date));%This collects the unique dates within the original dataset
volatility=[]; % to store the end result
for i=1:length(dd)
for j= 1:length(pp)
q=find(ismember(data_crsp(:,c.permno),pp(j),'rows')); % this line finds the lines within the orginial data set (data_crsp) which corresponds to the (j) line of the unique permno matrix
ddata=data_crsp(q,:); % this collects the filtered data for the given ( i) selected permno ( includeing the returns)
p=find(ismember(ddata(:,c.date),dd(i),'rows')); % this line finds the lines within the orginial data set (data_crsp) which corresponds to the (i) line of the unique date matrix, based on the (j) permno selected above
% end
dailyd=ddata(p,:);% this collects the filtered data for the given ( i) selected permno/ dates ( includeing the returns). Here we have all info regarding 1 permno for 1 given month.
ret= dailyd(:,7); % isolates return values only from the line above
logret=log(dailyd(:,c.ret)+1); %This gives us the log returns for a given firm for a given month
vol= std(logret, 0, 'all'); % This gives us the log volatility of 1 firm for 1 given day
volatility=[volatility;pp(j) vol dd(i)] % stores the results as the loop goes.
end
end

Respuestas (1)

Rupesh
Rupesh el 29 de Feb. de 2024
Editada: Rupesh el 29 de Feb. de 2024
Hi Amir Alao,
I understand that you are trying to optimize your MATLAB code to calculate daily standard deviations of log returns for each company and date in a large financial dataset. The nested for loops and repeated use of find and “ismember” are likely causing the slowdown you're experiencing. Regarding the use of “meshgrid”, it is a MATLAB function typically used for creating a coordinate grid for evaluating functions over a 2D space. It's useful in scenarios where you need to perform operations on every combination of two vectors, often seen in mesh-based computations like 3D surface plotting.
However, “meshgrid” is not directly applicable to the problem at hand, which involves grouping and statistical computation rather than mesh-based analyses. The code provided does not use “meshgrid” because the task involves processing tabular data with logical indexing and aggregation, not creating a grid of points for function evaluation. To speed up your calculations, you can use logical indexing and vectorized operations, which should be more efficient than your current approach. Below is a vectorized version of your code that avoids these inefficiencies:
% Assuming data_crsp is a matrix where:
% c.permno is the column index for permno in data_crsp
% c.date is the column index for date in data_crsp
% c.ret is the column index for returns in data_crsp
% Get unique permnos and dates
pp = unique(data_crsp(:,c.permno));
dd = unique(data_crsp(:,c.date));
% Preallocate the volatility matrix with zeros
volatility = zeros(length(pp) * length(dd), 3);
% Create a counter for indexing into the volatility matrix
counter = 1;
% Loop through each unique permno
for j = 1:length(pp)
% Get all rows for the current permno
permno_rows = data_crsp(data_crsp(:,c.permno) == pp(j), :);
% Loop through each unique date
for i = 1:length(dd)
% Get all rows for the current date within the current permno
date_rows = permno_rows(permno_rows(:,c.date) == dd(i), :);
% Calculate the log returns for the current permno and date
logret = log(date_rows(:,c.ret) + 1);
% Calculate the standard deviation of log returns
vol = std(logret);
% Store the results
volatility(counter, :) = [pp(j), vol, dd(i)];
% Increment the counter
counter = counter + 1;
end
end
This code calculates the standard deviation of log returns for each company and date in a financial dataset. It first extracts unique company identifiers “permno” and “dates”. Then, it preallocates a “volatility” matrix to store results. Nested loops iterate over each “permno” and date, selecting relevant data rows. Within the inner loop, log returns are computed, and their standard deviation is stored in “volatility”.
This should significantly reduce the runtime. However, if the dataset is extremely large, more advanced techniques like “accumarray” or “splitapply” might be necessary for further optimization. If you continue to face performance issues, please let me know, and I can assist you with these more complex solutions.
You can also refer to the below documents regarding operations of all functions involved in the above script.
Hope this helps!

Categorías

Más información sobre Resizing and Reshaping Matrices en Help Center y File Exchange.

Etiquetas

Productos

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by