简体   繁体   English

如何使用 R 中的泊松分布将一项观察结果与 dataframe 的 rest 进行比较?

[英]How to compare one observation against the rest of the dataframe using poisson distribution in R?

I want to find a way to compare the values of hp of a car using Poisson distribution to see which one is more likely to have the lowest value of hp from all cars for example Mazda Rx4 has a horsepower of 110. I want to simulate this value following the Poisson distribution for each of the cars in the sample.我想找到一种方法来使用泊松分布比较汽车的 hp 值,看看哪一个更有可能在所有汽车中具有最低的 hp 值,例如马自达 Rx4 的马力为 110。我想模拟这个样本中每辆汽车的泊松分布值。 I want to create a table that compares the probability of each car for all the cars in the data frame to have the lowest value in this indicator我想创建一个表格,比较数据框中所有汽车的每辆车的概率,以在该指标中具有最低值

I am using this example for simplicity, in reality, these are players names for golf player, and the horsepower is the number of strokes taken, that is why I want to have a list that has the probability of each observation in my sample to have the lowest score in this indicator为了简单起见,我使用这个例子,实际上,这些是高尔夫球手的名字,而马力是击球次数,这就是为什么我想要一个列表,其中包含我的样本中每个观察的概率该指标中的最低分

df <- mtcars

f <- function(n1, n2){
  mean(rpois(100, n1) < rpois(100, n2))
  
}


g <- Vectorize(f, c("n1", "n2"))
res <- outer(df$hp, df$hp, g)
dimnames(res) <- list(row.names(df), row.names(df))

This code compares all cars with each other, but I want a list that compares each car with all cars in the data frame to see the probability to have the lowest score.此代码将所有汽车相互比较,但我想要一个列表,将每辆汽车与数据框中的所有汽车进行比较,以查看得分最低的概率。 For example, the probability that Mazda RX4 has the lowest value in the data frame: something like that例如,马自达 RX4 在数据帧中具有最低值的概率:类似这样

            prob
Mazda RX4   0.03
Datsun 710  0.02
Duster 360  0.02

And so on until the last car of the sample.以此类推,直到样品的最后一辆车。 Prob is for the probability that the car has the lowest value of the hp in the sample. Prob 是汽车在样本中具有最低 hp 值的概率。

I'm not quite sure I'm understanding your question correctly, but here's an example of creating a poisson distribution based on the original value and summarizing the comparative results of those simulations:我不太确定我是否正确理解了您的问题,但这里有一个基于原始值创建泊松分布并总结这些模拟的比较结果的示例:

library(tidyverse)
  
df <- mtcars[1] %>% rownames_to_column("car")

df %>%
  uncount(10000, .id = "run") %>%
  rowwise() %>%
  mutate(sim_mpg = rpois(1, lambda = mpg)) %>%
  
  group_by(run) %>%
  arrange(sim_mpg) %>%
  mutate(lowest_mpg = row_number() == 1) %>%
  
  group_by(car) %>%
  summarize(chance_lowest = mean(lowest_mpg),
            orig_mpg = first(mpg)) %>%
  
ggplot(aes(orig_mpg, chance_lowest, label = car)) +
  geom_text(hjust = 0, check_overlap = TRUE) +
  scale_y_continuous(trans = scales::pseudo_log_trans(sigma = 0.001), 
                     labels = scales::percent_format(accuracy = 1), 
                     breaks = c(0, 0.01, 0.1*(1:4)))

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM