[英]Split groups using conditions and get percentile value
I have a dataset below and it is a traffic delay data for several roads. 我下面有一个数据集,它是几条道路的交通延误数据。 I would like to get a summary report on each road in which the "Day" and "Time" are split and 5% & 95% percentile are calculated.
我想获得一条关于“日”和“时间”分开并计算5%和95%百分位数的道路的摘要报告。
Here is the dataset: 这是数据集:
my.data <- read.table(text = '
Name Day Time Delay
road1 1 7 10
road1 1 7 11
road1 1 7 12
road1 2 8 10
road1 3 9 11
road2 1 7 12
road2 2 8 10
road3 1 7 11
road3 1 7 12
road3 3 9 13
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')
and I would like to get this kind of report: 我想得到这样的报告:
# result:
# Name Day Time Delay_5% Delay_95%
# road1 1 7 10 12
# road1 2 8 10 10
# road1 3 9 10 11
# road2 1 7 12 12
# road2 2 8 10 10
# road3 1 3 11 12
# road3 3 9 13 13
I coded using the below script but it does not give me the desired result: 我使用以下脚本编写了代码,但没有得到期望的结果:
my.data <- read.table(text = '
Name Day Time Delay
road1 1 7 10
road1 1 7 11
road1 1 7 12
road1 2 8 10
road1 3 9 11
road2 1 7 12
road2 2 8 10
road3 1 7 11
road3 1 7 12
road3 3 9 13
', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')
my.summary <- with(my.data, aggregate(list(Delay), by = list(Day,Time),
FUN = function(x) { road.percentile = quantile(x,c(0.05,0.95),na.rm = TRUE) } ))
my.summary <- do.call(data.frame, my.summary)
colnames(my.summary) <- c('Day', 'Rate')
my.summary
my.data <- merge(my.data, my.summary, by = ('Day',"Time"))
my.data
I wonder anyone could solve this problem? 我想知道有人能解决这个问题吗? Much appreciated!
非常感激!
Using dplyr
: 使用
dplyr
:
my.data %>%
group_by(Name, Day, Time) %>%
summarise(Delay_5 = round(quantile(Delay, c(.05)), 0),
Delay_95 = round(quantile(Delay, c(.95)), 0))
# A tibble: 7 x 5
# Groups: Name, Day [?]
Name Day Time Delay_5 Delay_95
<chr> <int> <int> <dbl> <dbl>
1 road1 1 7 10. 12.
2 road1 2 8 10. 10.
3 road1 3 9 11. 11.
4 road2 1 7 12. 12.
5 road2 2 8 10. 10.
6 road3 1 7 11. 12.
7 road3 3 9 13. 13.
Here is a solution for summarizing a data set by three grouping variables using the data.table
package. 这是一个使用
data.table
包通过三个分组变量汇总数据集的data.table
方案。
library(data.table)
tab = data.table(my.data)
summary_tab = tab[, list(delay_5pctl=quantile(Delay, probs=0.05),
delay_95pctl=quantile(Delay, probs=0.95)),
by=list(Name, Day, Time)]
summary_tab
# Name Day Time delay_5pctl delay_95pctl
# 1: road1 1 7 10.10 11.90
# 2: road1 2 8 10.00 10.00
# 3: road1 3 9 11.00 11.00
# 4: road2 1 7 12.00 12.00
# 5: road2 2 8 10.00 10.00
# 6: road3 1 7 11.05 11.95
# 7: road3 3 9 13.00 13.00
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.