R上2个条件的百分位数

Question

I have the following dataframe with 3 variables and several observations 我有以下具有3个变量和几个观察结果的数据框

    data <- read.table(text="
YEAR SECTOR VALUE
2016   A      2
2016   A      5
2016   A      10
2016   A      20
2016   A      50
2016   A     100
2016   A     200
2016   A     300
2016   B      20
2016   B      50
2016   B      100
2016   B      200
2016   B      500
2016   B     1000
2016   B     2000
2016   B     3000
2017   A      21
2017   A      51
2017   A      101
2017   A      201
2017   A      501
2017   A     1001
2017   A     2001
2017   A     3001
2017   B      201
2017   B      501
2017   B      1001
2017   B      2001
2017   B      5001
2016   B     10001
2017   B     20001
2017   B     30001", 
               header=TRUE)

I would like to calculate the 1st quartile, median and 3rd quartile within each YEAR + SECTOR for insance, the 1st quartile of Sector A and YEAR 2016 would return 5 as based on (2,5,10,20,50,100,200,300) . 我想计算每个内的第一四分位数，中位数和第三个四分位数YEAR + SECTOR为insance，的第一四分位数Sector A和YEAR 2016将返回5基于(2,5,10,20,50,100,200,300)

Answer 1

One option would be to group by 'YEAR', 'SECTOR', store the subset of fivenum in a tibble , unnest and then spread it to 'wide' format 一个选择是按“YEAR”，“部门”，子集存储fivenum在tibble ， unnest然后spread它“宽”格式

library(dplyr)
library(tidyr)
df1 %>%
  group_by(YEAR, SECTOR) %>% 
  group_map(~ .x %>% 
       summarise(val = list(tibble(categ  = c('1st quart', 'median', '3rd quart'), 
            val = fivenum(VALUE)[2:4])))) %>% 
  unnest %>%
  spread(categ, val)
# A tibble: 4 x 5
# Groups:   YEAR, SECTOR [4]
#   YEAR SECTOR `1st quart` `3rd quart` median
#  <int> <chr>        <dbl>       <dbl>  <dbl>
#1  2016 A              7.5         150     35
#2  2016 B            100          2000    500
#3  2017 A             76          1501    351
#4  2017 B            751         12501   2001

data 数据

df1 <- structure(list(YEAR = c(2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 
2016L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 
2017L, 2017L, 2017L, 2017L, 2017L, 2016L, 2017L, 2017L), SECTOR = c("A", 
"A", "A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B", 
"B", "B", "A", "A", "A", "A", "A", "A", "A", "A", "B", "B", "B", 
"B", "B", "B", "B", "B"), VALUE = c(2L, 5L, 10L, 20L, 50L, 100L, 
200L, 300L, 20L, 50L, 100L, 200L, 500L, 1000L, 2000L, 3000L, 
21L, 51L, 101L, 201L, 501L, 1001L, 2001L, 3001L, 201L, 501L, 
1001L, 2001L, 5001L, 10001L, 20001L, 30001L)), class = "data.frame",
row.names = c(NA, 
-32L))

Answer 2

How about this: 这个怎么样：

library(dplyr)
data %>% 
  group_by(SECTOR,YEAR) %>% 
  summarise(median = summary(VALUE)[3],
            q1 = summary(VALUE)[2],
            q3 = summary(VALUE)[5])

However, according to summary() , the first quantile for the example you provided should be 8.75 但是，根据summary() ，您提供的示例的第一个分位数应为8.75

Answer 3

probs = c(0.25, 0.5, 0.75)
ans = Reduce(function(x1, x2) merge(x1, x2, by = c("YEAR", "SECTOR")),
             lapply(probs, function(p)
                 aggregate(x = setNames(list(df1$VALUE), paste0("Q_",p)),
                           by = df1[c("YEAR", "SECTOR")],
                           FUN = function(x) quantile(x, probs = p))))
ans
#  YEAR SECTOR Q_0.25 Q_0.5 Q_0.75
#1 2016      A   8.75    35    125
#2 2016      B 100.00   500   2000
#3 2017      A  88.50   351   1251
#4 2017      B 751.00  2001  12501

Answer 4

Another method is using the quantile() function and dplyr : 另一种方法是使用quantile()函数和dplyr ：

library(dplyr)

data %>% 
  group_by(SECTOR, YEAR) %>% 
  summarize(q1 = quantile(VALUE)[1], 
            median = quantile(VALUE)[2], 
            q3 = quantile(VALUE)[3])

##   SECTOR  YEAR    q1 median   med    q3
##   <fct>  <int> <dbl>  <dbl> <dbl> <dbl>
## 1 A       2016     2   8.75    35    35
## 2 A       2017    21  88.5    351   351
## 3 B       2016    20 100      500   500
## 4 B       2017   201 751     2001  2001

R上2个条件的百分位数

问题描述

4 个解决方案

解决方案1
0 2019-02-26 17:44:19

data 数据

解决方案2
0 2019-02-26 17:44:45

解决方案3
0 2019-02-26 18:19:59

解决方案4
0 2019-02-26 19:21:10

R上2个条件的百分位数

问题描述

4 个解决方案

解决方案1 0 2019-02-26 17:44:19

data 数据

解决方案2 0 2019-02-26 17:44:45

解决方案3 0 2019-02-26 18:19:59

解决方案4 0 2019-02-26 19:21:10

解决方案1
0 2019-02-26 17:44:19

解决方案2
0 2019-02-26 17:44:45

解决方案3
0 2019-02-26 18:19:59

解决方案4
0 2019-02-26 19:21:10