简体   繁体   English

R 中是否有 function 来创建离散概率分布?

[英]Is there a function in R to create a discrete probability distribution?

I have a set of Bernoulli variables, giving specific values with different probabilities.我有一组伯努利变量,给出具有不同概率的特定值。 The variables are independent.变量是独立的。 I'm trying to build a simple discrete probability table for all the possible outcomes.我正在尝试为所有可能的结果建立一个简单的离散概率表。 A short example of the data I have is:我拥有的数据的一个简短示例是:

# A tibble: 2 x 4
  `test number`  prob value `no-value`
          <dbl> <dbl> <dbl>      <dbl>
1             1   0.7   1.7        0.3
2             2   0.6   1.5        0.6

Where the value is the sum of the possible values and the probability is the probability of this value.其中值是可能值的总和,概率是该值的概率。 The example I'm using is from an excel sheet.我使用的示例来自 excel 表。 The table I'm working on is a long list of independent tests.我正在处理的表格是一长串独立测试。 Each test has a possible value for success, a probability for success and a value for no success (with a probability of (1 - probability of success)).每个测试都有一个可能的成功值、一个成功概率和一个不成功值(概率为(1 - 成功概率))。 The probability table is a table that calculates the probability for each possible outcome - possible values (summing the values for that outcome) and the probability for that outcome.概率表是一个计算每个可能结果的概率的表 - 可能值(对该结果的值求和)和该结果的概率。 So the first possible outcome 3.2 = 1.7 + 1.5 has a probability of 0.42 = 0.7 * 0.6.所以第一个可能的结果 3.2 = 1.7 + 1.5 的概率为 0.42 = 0.7 * 0.6。 The second outcome is 2.3 = (1.7 + 0.6) with a probability of 0.28 = (0.7 * (1 - 0.6) and so on.第二个结果是 2.3 = (1.7 + 0.6),概率为 0.28 = (0.7 * (1 - 0.6),依此类推。

So the solution I'm trying to get is something like this (2.29 = 2.3, 0.899 = 0.9:所以我想要得到的解决方案是这样的 (2.29 = 2.3, 0.899 = 0.9:

# A tibble: 1 x 5
  value       `3.2` `2.299999999999999~ `1.8` `0.8999999999999999~
  <chr>       <dbl>               <dbl> <dbl>                <dbl>
1 probability  0.42               0.280  0.18                 0.12

Here is a way:这是一种方法:

dat <- data.frame(
  prob = c(0.3, 0.7, 0.6),
  value_success = c(1, 2, 3),
  value_failure = c(4, 5, 6)
)

ntrials <- nrow(dat)

issues <- setNames(
  do.call(expand.grid, replicate(ntrials, c(0,1), simplify = FALSE)),
  paste0("trial", 1:ntrials)
)

issues[["prob"]] <- apply(issues, 1, function(x){
  prod(ifelse(x==0, 1-dat$prob, dat$prob))
})

issues[["total"]] <- apply(issues[,1:ntrials], 1, function(x){
  sum(ifelse(x==0, dat$value_failure, dat$value_success))
})

issues
#   trial1 trial2 trial3  prob total
# 1      0      0      0 0.084    15
# 2      1      0      0 0.036    12
# 3      0      1      0 0.196    12
# 4      1      1      0 0.084     9
# 5      0      0      1 0.126    12
# 6      1      0      1 0.054     9
# 7      0      1      1 0.294     9
# 8      1      1      1 0.126     6

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM