[英]Calculate differences between groups in R
For an example dataframe: 对于示例数据框:
df1 <- structure(list(name = c("a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u",
"v", "w", "x", "y", "z", "a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u",
"v", "w", "x", "y", "z", "a", "b", "c", "d", "e", "f", "g", "h",
"i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u",
"v", "w", "x", "y", "z"), amount = c(5.5, 5.4, 5.2, 5.3, 5.1,
5.1, 5, 5, 4.9, 4.5, 6, 5.9, 5.7, 5.4, 5.3, 5.1, 5.6, 5.4, 5.3,
5.6, 4.6, 4.2, 4.5, 4.2, 4, 3.8, 6, 5.8, 5.7, 5.6, 5.3, 5.6,
5.4, 5.5, 5.4, 5.1, 9, 8.8, 8.6, 8.4, 8.2, 8, 7.8, 7.6, 7.4,
7.2, 6, 5.75, 5.5, 5.25, 5, 4.75, 10, 8.9, 7.8, 6.7, 5.6, 4.5,
3.4, 2.3, 1.2, 0.1, 6, 5.8, 5.7, 5.6, 5.5, 5.5, 5.4, 5.6, 5.8,
5.1, 6, 5.5, 5.4, 5.3, 5.2, 5.1), decile = c(1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 1L, 2L, 3L, 4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L,
4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L), time = c(2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L, 2016L,
2016L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2017L,
2017L, 2017L, 2017L, 2017L, 2017L, 2017L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("name", "amount",
"decile", "time"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-78L), spec = structure(list(cols = structure(list(name = structure(list(), class = c("collector_character",
"collector")), amount = structure(list(), class = c("collector_double",
"collector")), decile = structure(list(), class = c("collector_integer",
"collector")), time = structure(list(), class = c("collector_integer",
"collector"))), .Names = c("name", "amount", "decile", "time"
)), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
I wish to calculate the mean result for deciles 1, 5 and 10 BY each year (2016, 17 etc.). 我希望计算每年十分位数1、5和10 BY的平均结果(2016年,17年等)。 I then wish to create a final table detailing year in the first column and then the gap between the mean result for deciles 1 and 10 (ie decile 10 result minus decile 1 result), and then the gradient between the mean results for deciles 5 and 10 (ie 10 mean result minus 5 mean result) which is the difference in means between deciles 5 and 10.
然后,我希望在第一列中创建一个最终年份表,详细说明年份,然后是十分位数1和10的平均结果之间的差距(即十分位数10结果减去十分位数1的结果),然后是十分位数5和10的平均结果之间的梯度。 10(即10个平均结果减去5个平均结果),这是十分位数5和10之间的均值差。
To illustrate I have create an working example of the data for 2016. I list the values for deciles 1, 5 and 10 for 2016. I then use these values to work out the gap and gradient difference. 为了说明这一点,我为2016年的数据创建了一个工作示例。我列出了2016年的十分位1、5和10的值。然后,我使用这些值计算出间隙和梯度差。
summary2016 <- structure(list(`2016` = c(NA_character_, NA_character_, NA_character_,
NA_character_), `1` = c("5", "10", "Gap", "Gradient"), `5.5` = c(5.1,
4.5, 1.4, 0.3), `6` = c(5.3, 5.6, NA, NA), `11.5` = c(10.4, 10.1,
NA, NA)), .Names = c("2016", "1", "5.5", "6", "11.5"), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -4L), spec = structure(list(
cols = structure(list(`2016` = structure(list(), class = c("collector_character",
"collector")), `1` = structure(list(), class = c("collector_character",
"collector")), `5.5` = structure(list(), class = c("collector_double",
"collector")), `6` = structure(list(), class = c("collector_double",
"collector")), `11.5` = structure(list(), class = c("collector_double",
"collector"))), .Names = c("2016", "1", "5.5", "6", "11.5"
)), default = structure(list(), class = c("collector_guess",
"collector"))), .Names = c("cols", "default"), class = "col_spec"))
Can this be done in one step, or would I need to break it down? 可以一步完成,还是需要分解?
library(tidyverse)
df1 %>% filter(decile %in% c(1, 5, 10)) %>%
group_by(time, decile) %>% summarise(mean = mean(amount)) %>%
mutate(gap1 = mean - mean[1], gap5 = mean - mean[2])
# A tibble: 9 x 5
# Groups: time [3]
# time decile mean gap1 gap5
# <int> <int> <dbl> <dbl> <dbl>
# 1 2016 1 5.75 0 0.55
# 2 2016 5 5.20 -0.55 0
# 3 2016 10 5.05 -0.7 -0.150
# 4 2017 1 6.4 0 0.775
# 5 2017 5 5.62 -0.775 0
# 6 2017 10 6.15 -0.25 0.525
# 7 2018 1 7.33 0 1.90
# 8 2018 5 5.43 -1.90 0
# 9 2018 10 2.60 -4.73 -2.83
Numbers are different from yours, so perhaps you are looking for some other kind of gaps. 数字与您的数字有所不同,因此也许您正在寻找其他差距。 Your example
summary2016
also has a somewhat unusual structure, while the solution above produces something more than you ask, but is in a nicer format. 您的示例
summary2016
也具有某种不同寻常的结构,而上面的解决方案产生的结果超出您的要求,但格式更好。
In particular, gap1
is mean(decile i) - mean(decile 1) , where i = 1, 5, 10, while gap5
is mean(decile i) - mean(decile 5) . 特别地,
gap1
是mean(十分位i)-均值(十分位1) ,其中i = gap5
,而gap5
是mean(十分位i)-均值(十分位5) 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.