Consecutive count of values based on multiple conditions

3 visualizaciones (últimos 30 días)
mtango
mtango el 19 de Nov. de 2020
Respondida: Setsuna Yuuki. el 19 de Nov. de 2020
I have a very big table consisting of approximately 3 million rows. The table consists of four columns [ID period amount pbehind], see the following screenshot including the output column I want.
For every ID, I want to count the consecutive amount of times of behind payment (pbehind). But there are a few conditions to this:
  1. when pbehind <= 0.5, then output = 0
  2. when pbehind > 1.5, but value in previous period is 0, then output = 0;
  3. when pbehind > 0.5, but amount in next period (which is last period) is zero, then output=0.
There are a couple of things to note:
  1. period does not have to start at 1 and increase by 1 (see ID = 2). However it is ordered by ID and period using sortrows(table,[1,2]).
  2. An ID can make multiple payments for a certain period, however pbehind will always be equal to for that period (see ID=3).
I know for the first condition that I can just do the following:
output = pbehind;
output(output<=0.5)=0;
But I cannot figure out how to efficiently implement the other conditions. Does anyone know how to do this?

Respuestas (2)

Peter Perkins
Peter Perkins el 19 de Nov. de 2020
Put your data in a table, and use rowfun with ID as the grouping variable. Write a function that does all the logic you need to enforce. Your function will be applied to one group of rows at a time, and should return a column vector of output values. Here's a simple example:
>> t = table([1;1;1;2;2],rand(5,1),rand(5,1),'VariableNames',{'ID' 'X' 'Y'})
t =
5×3 table
ID X Y
__ ________ ________
1 0.82202 0.26854
1 0.041591 0.63908
1 0.91635 0.031734
2 0.17678 0.33395
2 0.92236 0.8908
>> t2 = rowfun(@(x,y) x - mean(y),t,'GroupingVariable','ID')
t2 =
5×3 table
ID GroupCount Var3
__ __________ ________
1 3 0.50891
1 3 -0.27152
1 3 0.60323
2 2 -0.4356
2 2 0.30998
>> t.Output = t2.Var3
t =
5×4 table
ID X Y Output
__ ________ ________ ________
1 0.82202 0.26854 0.50891
1 0.041591 0.63908 -0.27152
1 0.91635 0.031734 0.60323
2 0.17678 0.33395 -0.4356
2 0.92236 0.8908 0.30998
Obviously you will need to write your own function.
  1 comentario
mtango
mtango el 19 de Nov. de 2020
The problem is that I cannot figure out how to create code to count the consecutive amount of times of behind payment (pbehind) for each ID...

Iniciar sesión para comentar.


Setsuna Yuuki.
Setsuna Yuuki. el 19 de Nov. de 2020
you should try with a series of if..else. For example:
for i = length(output)
if(pbehind(i) > 1.5 && pbehind(i-1) == 0)
output(i) =0;
elseif(condition)
...
end

Categorías

Más información sobre Tables en Help Center y File Exchange.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by