Almost always, probability expression involving factorial is some result of "N choose K" computation:
But it is very inefficient to compute this via factorial, and most importantly, it is not numerically stable. Have a look at your code using factorial()
: you got NaN
.
In R, the choose(N, K)
function computes "N choose K" fast and stably.
Now, a careful inspection of your given formulation shows that it is equivalent to:
choose(N-100, 50) / choose(N, 60)
So, you can do:
P <- choose(N-100, 50) / choose(N, 60)
plot(N, P, type = "l")
Follow-up
Hi, this is a very efficient function. But mean, mode, and median of this plot doesn't match the ones I have in my course materials for the same plot? The mean should be 727, Mode= 600, median= 679!! How can I get these descriptives from your suggested plot?
I am confused by what your course material is trying to do. The probability you give is conditional probability P(D | N)
, ie, a probability for random variable D
. While we sketch P
against N
. Hence, the plot above is not a probability mass function! Then, how can we use it to compute statistics like mean, mode and median, for random variable N
???
Well anyway, since you ask and insist on getting an answer, let's pretend this is a probability mass function for random variable N
. But since it is not a true one, sum(P)
is not or even close to 1. We actually have sum(P) = 3.843678e-12
. So, to use it as a proper probability mass function, we need to normalize it first.
P <- P / sum(P)
Now P
sum up to 1.
To compute mean, we do
sum(N * P)
# [1] 726.978
To compute mode, we do
N[which.max(P)]
# 599
To compute median, we do
N[which(cumsum(P) > 0.5)[1]]
# 679
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.