简体   繁体   English

使用条件拆分组并获得百分位值

[英]Split groups using conditions and get percentile value

I have a dataset below and it is a traffic delay data for several roads. 我下面有一个数据集,它是几条道路的交通延误数据。 I would like to get a summary report on each road in which the "Day" and "Time" are split and 5% & 95% percentile are calculated. 我想获得一条关于“日”和“时间”分开并计算5%和95%百分位数的道路的摘要报告。

Here is the dataset: 这是数据集:

my.data <- read.table(text = '
                        Name     Day  Time   Delay     
                      road1        1      7   10
                      road1        1      7   11
                      road1        1      7   12
                      road1        2      8   10       
                      road1        3      9   11       
                      road2        1      7   12       
                      road2        2      8   10       
                      road3        1      7   11       
                      road3        1      7   12       
                      road3        3      9   13        
                      ', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')

and I would like to get this kind of report: 我想得到这样的报告:

# result:
# Name       Day      Time      Delay_5%     Delay_95%
# road1       1         7          10           12
# road1       2         8          10           10
# road1       3         9          10           11
# road2       1         7          12           12
# road2       2         8          10           10
# road3       1         3          11           12
# road3       3         9          13           13

I coded using the below script but it does not give me the desired result: 我使用以下脚本编写了代码,但没有得到期望的结果:

my.data <- read.table(text = '
                        Name     Day  Time   Delay     
                      road1        1      7   10
                      road1        1      7   11
                      road1        1      7   12
                      road1        2      8   10       
                      road1        3      9   11       
                      road2        1      7   12       
                      road2        2      8   10       
                      road3        1      7   11       
                      road3        1      7   12       
                      road3        3      9   13        
                      ', header = TRUE, stringsAsFactors = FALSE, na.strings = 'NA')

my.summary <- with(my.data, aggregate(list(Delay), by = list(Day,Time), 
                                      FUN = function(x) { road.percentile = quantile(x,c(0.05,0.95),na.rm = TRUE) } ))

my.summary <- do.call(data.frame, my.summary)

colnames(my.summary) <- c('Day', 'Rate')
my.summary

my.data <- merge(my.data, my.summary, by = ('Day',"Time"))
my.data

I wonder anyone could solve this problem? 我想知道有人能解决这个问题吗? Much appreciated! 非常感激!

Using dplyr : 使用dplyr

my.data %>%
  group_by(Name, Day, Time) %>%
  summarise(Delay_5 = round(quantile(Delay, c(.05)), 0),
            Delay_95 = round(quantile(Delay, c(.95)), 0))

# A tibble: 7 x 5
# Groups:   Name, Day [?]
  Name    Day  Time Delay_5 Delay_95
  <chr> <int> <int>   <dbl>    <dbl>
1 road1     1     7     10.      12.
2 road1     2     8     10.      10.
3 road1     3     9     11.      11.
4 road2     1     7     12.      12.
5 road2     2     8     10.      10.
6 road3     1     7     11.      12.
7 road3     3     9     13.      13.

Here is a solution for summarizing a data set by three grouping variables using the data.table package. 这是一个使用data.table包通过三个分组变量汇总数据集的data.table方案。

library(data.table)

tab = data.table(my.data)

summary_tab = tab[, list(delay_5pctl=quantile(Delay, probs=0.05), 
                         delay_95pctl=quantile(Delay, probs=0.95)), 
                  by=list(Name, Day, Time)]

summary_tab
#     Name Day Time delay_5pctl delay_95pctl
# 1: road1   1    7       10.10        11.90
# 2: road1   2    8       10.00        10.00
# 3: road1   3    9       11.00        11.00
# 4: road2   1    7       12.00        12.00
# 5: road2   2    8       10.00        10.00
# 6: road3   1    7       11.05        11.95
# 7: road3   3    9       13.00        13.00

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM