Multiply two probability plots (CDF/PDF)

I have two probability plots, one generated as a CDF, and one as a pdf. The exact mathematics is not important for my purpose, I only want to extract the qualitative idea.
This is the code I used:
figure()
ax1 = subplot(1,1,1);
cdfplot(temp);
% plot(DE,y)
ax1.XDir = 'reverse';
set(gca, 'YScale', 'log')
figure();
pd_HOLT = fitdist(total_HOLT,'Normal');
DE_HOLT = bingroups_HOLT;
y_HOLT = pdf(pd_HOLT,DE_HOLT);
ax1 = subplot(1,1,1);
plot(DE_HOLT,y_HOLT)
ax1.XDir = 'reverse';
set(gca, 'YScale', 'log')
The x-axis is the same. How can I multiply these plots to convey a (qualitative) idea? Thanks.

4 comentarios

Malay Agarwal
Malay Agarwal el 19 de Sept. de 2024
Please share details about the variables A.Var7, total_HOLT and others so that your code can be executed.
Jeff Miller
Jeff Miller el 20 de Sept. de 2024
What are you trying to convey an idea about? There seem to be several possibilities:
  • the distribution of a random variable that is the sum of independent A.Var7 and total_HOLT values
  • the distribution of a random variable that is the product of independent A.Var7 and total_HOLT values
  • the joint (bivariate) distribution of independent A.Var7 and total_HOLT values
  • something else
Deepayan Bhadra
Deepayan Bhadra el 20 de Sept. de 2024
Hi @Malay Agarwal, I have uploaded the data (note: changed A.var7 -> temp)
Deepayan Bhadra
Deepayan Bhadra el 20 de Sept. de 2024
Hi @Jeff Miller: Since I am trying to do a point-wise multiplication, the idea I am trying to convey is about the (combined) decreasing trend, as we proceed towards -100

Iniciar sesión para comentar.

 Respuesta aceptada

Umar
Umar el 20 de Sept. de 2024

0 votos

Hi @Deepayan Bhadra,

You mentioned, “How can I multiply these plots to convey a (qualitative) idea? “

Please see my response to your comments below.

My suggestion to achieve your goal of combining these two plots while maintaining clarity would be overlaying them in a single figure. This would allow you to visualize both the cumulative probabilities and the density of occurrences simultaneously. After analyzing your code, here is example code snippet to help you out.

% Generate some generic data for demonstration
data = randn(1000, 1); % Normal distributed data
% Create CDF plot
figure();
ax1 = subplot(1,1,1);
cdfplot(data);
hold on;
% Fit a normal distribution to the data
pd = fitdist(data,'Normal');
% Generate x values for PDF
x_values = linspace(min(data), max(data), 100);
y_values = pdf(pd, x_values);
% Plot PDF on the same axes
plot(x_values, y_values, 'r-', 'LineWidth', 2);
% Reverse x-axis and set log scale for y-axis
ax1.XDir = 'reverse';
set(gca, 'YScale', 'log');
% Add legends and labels
legend('CDF', 'PDF');
xlabel('Value');
ylabel('Probability');
title('Combined CDF and PDF Plot');
hold off;

So, in above example code, I used randn(1000, 1) to create a sample dataset that follows a normal distribution. The cdfplot(data) function creates the cumulative distribution function plot. You will see that the example code fits a normal distribution to the data and calculates its PDF over a range of x-values. The hold on command allows you to overlay the PDF plot on top of the CDF plot in the same figure and the x-axis is reversed, for more information on this command, please refer to

https://www.mathworks.com/help/matlab/ref/hold.html#

and a logarithmic scale is applied to the y-axis for better visibility of both plots. Make sure when you visualize both plots together, see how the probability density (PDF) at each point contributes to the cumulative probability (CDF). This dual representation will help you understanding both local behavior (PDF) and global behavior (CDF) of your data distribution. Feel free to adjust line styles, colors, and markers according to your preferences for better visual distinction between CDF and PDF.

Please see attached.

If you have any further questions, please let me know.

11 comentarios

Deepayan Bhadra
Deepayan Bhadra el 20 de Sept. de 2024
@Umar: Thanks for your effort, but I think you got my intention wrong. I was not trying to overlay two plots. As explained above, I want one plot that conveys the diminishing trend. I don't even need to preserve them as probability plots. If I can convert them first to standard curves for multiplication, that also works.
Umar
Umar el 21 de Sept. de 2024

Hi @Deepayan Bhadra,

You mentioned, “ I want one plot that conveys the diminishing trend. I don't even need to preserve them as probability plots.”

Please see my response to your comments below. Thanks for clarifying about your plot requirements. So, to combine the CDF and PDF into a single plot that effectively conveys a diminishing trend, you can normalize both the CDF and PDF so that they can be combined meaningfully. Then, multiply the standardized curves to visualize the interaction between them. Afterwards, create a single plot that represents this product. However, I did modify the above using generic data , please let me know if this resolves the issue.

% Generate some generic data for demonstration
data = randn(1000, 1); % Normally distributed data
% Create CDF
[values_cdf, x_cdf] = ecdf(data);
cdf_standardized = values_cdf / max(values_cdf); % Standardize   CDF
% Fit a normal distribution to the data for PDF
pd = fitdist(data, 'Normal');
x_pdf = linspace(min(data), max(data), 100);
pdf_values = pdf(pd, x_pdf);
pdf_standardized = pdf_values / max(pdf_values); % Standardize   PDF
% Multiply standardized CDF and PDF
combined_curve = cdf_standardized .* interp1(x_pdf,   pdf_standardized, x_cdf, 'linear', 'extrap');
% Plotting
figure();
plot(x_cdf, combined_curve, 'b-', 'LineWidth', 2);
hold on;
plot(x_cdf, cdf_standardized, 'r--', 'LineWidth', 1.5); %   Original CDF for reference
plot(x_pdf, pdf_standardized, 'g--', 'LineWidth', 1.5); %   Original PDF for reference
xlabel('Value');
ylabel('Combined Value');
title('Combined Diminishing Trend from CDF and PDF');
legend('Combined Curve', 'Standardized CDF', 'Standardized PDF');
set(gca, 'YScale', 'log'); % Log scale for better visibility
hold off;

Please see attached.

So, in the above code, you will find out that applying a logarithmic scale helps in visualizing trends better, especially when dealing with probabilities or densities that span several orders of magnitude. Standardizing both curves makes sure that they are comparable and can be meaningfully and the resulting combined curve visually represents how the likelihood of occurrence diminishes as you move away from the peak density. If you still have any further questions or need additional adjustments, please let me know!

Sam Chak
Sam Chak el 21 de Sept. de 2024
Hi @Umar, I believe @Deepayan Bhadra's initial intention was to qualitatively interpret the meaning of the multiplication of the CDF and the PDF, or to derive the interpretation from the diminishing trend.
Umar
Umar el 21 de Sept. de 2024
Hi @Sam Chak,
Could you please point out what is missing in my code because in my opinion, the approach I used in my code successfully integrates both CDF and PDF into a single plot that effectively conveys diminishing trends through normalization and multiplication. If further adjustments or clarifications are needed regarding specific aspects of this implementation or its interpretation, please provide your feedback or suggestions
Umar
Umar el 21 de Sept. de 2024

I believe @Deepayan Bhadra's initial intention was to qualitatively interpret the meaning of the multiplication of the CDF and the PDF, or to derive the interpretation from the diminishing trend.

Your comments are duly noted and I do respect your opinion. Please let me briefly explain about my recent example code snippet provided, the process begins by generating normally distributed data and calculating both the CDF and PDF. Each function is then standardized to make sure they can be meaningfully combined. The key step involves multiplying the standardized CDF and PDF, which allows for a visual representation of their interaction. The resulting combined curve is plotted alongside the original standardized CDF and PDF for reference. This approach not only highlights the diminishing trend but also provides a qualitative interpretation of how the two distributions interact.

The use of a logarithmic scale enhances visibility, making it easier to observe the diminishing nature of the combined curve. This method effectively meets OP’s (@Deepayan Bhadra's) requirements by creating a single plot that conveys the desired trend without the need for preserving the original probability characteristics. Also, I would suggest that @Deepayan Bhadra should point out as well what part of the code not achieving her goal, so I can provide work around or share technical tips to help her out because the whole purpose of this Mathworks community is to share knowledge and help out OPs to achieve their goal which helps them not only understand the concept but also motivates them to lear more.

Sam Chak
Sam Chak el 21 de Sept. de 2024
Thank you for your feedback. There is nothing wrong with your code; it was my attempt to express what the OP wants. My interpretation may be inaccurate until @Deepayan Bhadra provides clarification.
Since all three curves (CDF, PDF, and CDF × PDF) should exhibit distinct trends, it is reasonable to assume that the OP wants to interpret the trends of these curves in an understandable manner using linguistic terms. Of course, this interpretation depends on the expert making the assessment.
Does the OP intend to use the product of CDF × PDF to predict or explain the behavior of a statistical model?
Umar
Umar el 21 de Sept. de 2024
Hi @Sam Chak,
I agree that the distinct trends exhibited by the CDF, PDF, and their product are crucial for a comprehensive understanding. It is indeed essential for us to await clarification from @Deepayan Bhadra to make sure our interpretations align with the OP’s expectations.
Regarding your question about the use of the product of CDF × PDF in predicting or explaining statistical model behavior, again it would be beneficial to confirm this directly with the OP. Gaining clarity on their intentions will help us provide more accurate guidance moving forward.
Thank you once again for your valuable input. I look forward to collaborating further as we refine our approach based on additional information.
Deepayan Bhadra
Deepayan Bhadra el 23 de Sept. de 2024
Hi @Umar, fantastic stuff! This is exactly what I wanted. I guess I just didn't know the ecdf function.
Umar
Umar el 24 de Sept. de 2024
Editada: Umar el 24 de Sept. de 2024

Hi @Deepayan Bhadra,

Glad to know your problem is resolved. For more information on ecdf function, please refer to

https://www.mathworks.com/help/stats/ecdf.html

Deepayan Bhadra
Deepayan Bhadra el 25 de Sept. de 2024
Editada: Deepayan Bhadra el 25 de Sept. de 2024
@Umar: A follow up question: If you look at the vectors total_HOLT and bingroups_HOLT in the data, how can I plot a simple discrete probability?
>> a = [70,23,3,0,0,0,0,0,0,0];
>> b = [-90,-80,-70,-60,-50,-40,-30,-20,-10,0];
For example, I want to show that if we have a 'b' value somewhere between (-90,-70), then we have a high probability dictated by corresponding values in 'a' but a negligible probability otherwise for other values in 'b'
Umar
Umar el 26 de Sept. de 2024
Editada: Umar el 27 de Sept. de 2024

Hi @ Deepayan Bhadra,

After reviewing your comments second time, I do apologize for misunderstanding your comments. I have edited comments in above post. Your goal is to visualize how the values in vector a correspond to the ranges of values in vector b, specifically showing high probabilities within a certain range of b. Also, you want to represent discrete probabilities derived from two vectors, where:

  • Vector a contains probability values.
  • Vector b defines corresponding ranges.

In your given example, you are interested in showing that a value in vector b between -90 and -70 corresponds to a high probability from vector a. You already defined the data by representing them in two vectors:

   a = [70, 23, 3, 0, 0, 0, 0, 0, 0, 0];
   b = [-90, -80, -70, -60, -50, -40, -30, -20, -10, 0];

It will be helpful to normalize your probabilities so that they sum to 1 if you are treating them as probabilities. In this case:

     total_probability = sum(a);
     normalized_a = a / total_probability; % Normalize 'a'

Then use a bar plot to represent the probabilities clearly. Here’s how you can implement this in MATLAB:

   % Create the bar plot for probabilities
   figure();
   bar(b, normalized_a); % Use 'b' for x-axis and normalized    probabilities for 
   y-axis
   xlabel('Value (b)');
   ylabel('Probability');
   title('Discrete Probability Distribution');
   % Highlight the range of interest
   hold on;
   xline(-90,'r--','Start    Range','LabelHorizontalAlignment','left');
   xline(-70,'g--','End    Range','LabelHorizontalAlignment','right');
   % Customize y-axis limits if necessary
   ylim([0 max(normalized_a) * 1.1]);
   % Add grid for better visibility
   grid on;
   hold off;

As you can see in example code snippet, the bar function creates a bar plot where each bar represents the probability associated with each value in vector b. The xline function is used to add vertical lines at -90 and -70 to visually indicate the range of interest and normalization makes sure that your representation is consistent with probability principles. Finally, grid enhances readability by allowing viewers to gauge values more easily. This visualization will clearly show that values in vector a corresponding to b values between -90 and -70 are significantly higher than those outside this range. This can be crucial for interpreting data distributions or making decisions based on these probabilities.

Hope this answers your question. If you have further questions or need additional modifications, feel free to ask!

Iniciar sesión para comentar.

Más respuestas (0)

Productos

Versión

R2022b

Preguntada:

el 19 de Sept. de 2024

Editada:

el 27 de Sept. de 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by