Using histcounts to determine loose data mode
Mostrar comentarios más antiguos
As a form of filtering, I'm using histcounts to grab something akin to the mode of a data set. The idea being, I lean on histcounts automatic binning algorithm to perform the initial data grouping, then resample the data so as to compress all non-zero-count adjacent bins into single bins. Finally, take the edges of the highest-count bin from this second grouping and use those for other bits of data processing.
data = [randi([0,8000],1,20),randi([140000,260000],228)];
(xCounts,xEdges) = hiscounts(data);
t1 = find(xCounts); t2 = diff([0,diff(t1)==1,0]); %Finds the gaps between populated bin groups
t3 = t1(t2>0); t4 = t1(t2<0); %Starting and ending indecis of bin groups
and I'm stuck here. I know the corresponding indecis of t3 & t4 represent the grouping indecis of xCounts (e.g. group 1 is xCounts(t3(1):t4(1))), but I can figure out how to get a properly vectorized version of sum(xCounts(t3:t4)). The loop version is simple:
xCountsNew = zeros(1,numel(t3))
for i=1:numel(t3)
xCountsNew(i) = sum(xCounts(t3(i):t4(i)))
end
but I'm trying to improve my vectorization/minimize loops.
So there's really three questions here:
1) Is this a decent way to get a loose mode of a data set?
2) How can I vectorize the above for loop?
3) Should I vectorize the above for loop? I have learned that for loops are generally faster than arrayfun calls, but I feel like there's a way to vectorize the loop without using arrayfun or similar.
1 comentario
Gabriel Stanley
el 23 de Mzo. de 2023
Editada: Gabriel Stanley
el 23 de Mzo. de 2023
Respuestas (2)
I am not certain what you want to do.
The histcounts function has a third output bin that will index into the elements that were assigned to a particular bin counts bin.
x = randn(1,25)
[xCounts,xEdges,Bin] = histcounts(x,7)
[~,idx] = max(xCounts)
AssignedToLargestBin = x(Bin == idx)
BinsIdx = ismember(Bin,idx+[-1 0 1])
MaxAdjacentBins = x(BinsIdx) % Return Elements Of 'x' From Largest & Two Adjacent Bins
You can of course set ‘idx’ to be whatever you like, and this can be straightforward if there are more than one index, as illustrated here.
.
I think that you have a case of the XY problem here. Namely, you are asking about your solution to a particular problem, but I suspect there is a more direct way to solve your actual problem.
I'm guessing here, but it seems that you have a data sample, and you want to estimate where the maximal density of that sample is. Is that right?
If that is right, then you either
- Know the functional form of the underlying distribution, OR
- You do not
Again, I'm guessing, but it seems like you don't.
If both of my guesses are correct, then I would use the ksdensity function to make an empirical estimate of the underlying continuous distribution, and see where the maximum is (using a sufficiently fine grid).
rng default
data = 0.5 + 0.1*randn(1,300); % Using randn instead of rand, so that there is truly a mode
xi = 0 : 0.005 : 1;
data_pdf = ksdensity(data,xi);
figure
hold on
histogram(data,"Normalization","pdf")
plot(xi,data_pdf)
[maxPdf,indexToMax] = max(data_pdf);
xiOfMax = xi(indexToMax)
Of course, that's a lot of guesswork on my part. But it is typically better to tell us the problem you are trying to solve, in addition to your method.
Categorías
Más información sobre Loops and Conditional Statements en Centro de ayuda y File Exchange.
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
