I have a txt that contains data such as:
ranking index tornado reports hail reports wind reports
0.3968208 9 1 7
0.156263 2 0 3
0.1444246 10 1 7
0.2830781 7 2 6
0.1258707 12 0 2
0.2452705 6 0 6
0.07492937 6 2 8
0.1862151 8 1 5
0.3258324 6 2 17
0.09579834 2 2 10
0.8557362 11 3 14
0.05694438 8 3 9
0.6755703 4 3 24
1.695709 14 0 5
1.242222 17 2 12
0.220234 7 1 7
0.5113825 6 0 6
0.2355718 3 0 12
0.0799512 1 1 6
1.267324 15 2 6
0.0862502 7 1 3
1.151916 33 2 6
0.06002221 9 0 17
0.2011567 11 5 17
I need to find the probability of a wind outbreak being major (ranking index larger than 0.25), given the number of hail reports is larger than 10, the number of wind reports is larger than 20, and the number of tornado reports is larger than 5?
Assuming this is a part of the complete data . The below dplyr
based solution is based on conditions: hail_reports > 2 & wind_reports > 2 & tornado_reports > 5 (or else you would get a probability of zero for this test data). Modify it appropriately for complete data.
librray(dplyr)
df %>%
filter (hail_reports > 2 & wind_reports > 2 & tornado_reports > 5) %>%
mutate(major = if_else(ranking_index > 0.25, 1, 0)) %>% # major= 1: index > 0.25
group_by(major) %>% summarize(n = n()) %>%
transmute(major, prob = n/sum(n))
# major prob
# <dbl> <dbl>
# 1 0 0.667
# 2 1 0.333 # major prob = 0.333
PS: Always better to avoid spaces in column names. For eg. use "hail_reports" instead of "hail reports"
我认为这是一个不可能发生的事件,因为在给定的数据集中,冰雹报告的数量绝不会大于10.或者上面提供的只是一个样本,而不是完整的集合?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.