简体   繁体   中英

How to compare one observation against the rest of the dataframe using poisson distribution in R?

I want to find a way to compare the values of hp of a car using Poisson distribution to see which one is more likely to have the lowest value of hp from all cars for example Mazda Rx4 has a horsepower of 110. I want to simulate this value following the Poisson distribution for each of the cars in the sample. I want to create a table that compares the probability of each car for all the cars in the data frame to have the lowest value in this indicator

I am using this example for simplicity, in reality, these are players names for golf player, and the horsepower is the number of strokes taken, that is why I want to have a list that has the probability of each observation in my sample to have the lowest score in this indicator

df <- mtcars

f <- function(n1, n2){
  mean(rpois(100, n1) < rpois(100, n2))
  
}


g <- Vectorize(f, c("n1", "n2"))
res <- outer(df$hp, df$hp, g)
dimnames(res) <- list(row.names(df), row.names(df))

This code compares all cars with each other, but I want a list that compares each car with all cars in the data frame to see the probability to have the lowest score. For example, the probability that Mazda RX4 has the lowest value in the data frame: something like that

            prob
Mazda RX4   0.03
Datsun 710  0.02
Duster 360  0.02

And so on until the last car of the sample. Prob is for the probability that the car has the lowest value of the hp in the sample.

I'm not quite sure I'm understanding your question correctly, but here's an example of creating a poisson distribution based on the original value and summarizing the comparative results of those simulations:

library(tidyverse)
  
df <- mtcars[1] %>% rownames_to_column("car")

df %>%
  uncount(10000, .id = "run") %>%
  rowwise() %>%
  mutate(sim_mpg = rpois(1, lambda = mpg)) %>%
  
  group_by(run) %>%
  arrange(sim_mpg) %>%
  mutate(lowest_mpg = row_number() == 1) %>%
  
  group_by(car) %>%
  summarize(chance_lowest = mean(lowest_mpg),
            orig_mpg = first(mpg)) %>%
  
ggplot(aes(orig_mpg, chance_lowest, label = car)) +
  geom_text(hjust = 0, check_overlap = TRUE) +
  scale_y_continuous(trans = scales::pseudo_log_trans(sigma = 0.001), 
                     labels = scales::percent_format(accuracy = 1), 
                     breaks = c(0, 0.01, 0.1*(1:4)))

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM