[英]Create summary value when using group_by and summarize
我經常想顯示給定基准年的變化。 例如,自給定年份以來,發生了什么變化(百分比)? gapminder
數據集提供了一個很好的示例:
要開始得到答案,你會group_by
一年,大陸, summarize
和的人口。 但是,您如何獲得一個匯總值,即1952年人口呢?
library(gapminder)
gapminder %>%
group_by(year, continent) %>%
summarize(tot_pop = sum(as.numeric(pop)),
SUMMARY_VAL = POP_SUM_1952,
CHG_SINCE_1952 = (tot_pop - SUMMARY_VAL ) / SUMMARY_VAL ) %>%
ggplot(aes(x = year, y = CHG_SINCE_1952, color = continent)) +
geom_line()
僅供參考,gapminder看起來像這樣:
# A tibble: 1,704 x 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
7 Afghanistan Asia 1982 39.9 12881816 978.
8 Afghanistan Asia 1987 40.8 13867957 852.
9 Afghanistan Asia 1992 41.7 16317921 649.
10 Afghanistan Asia 1997 41.8 22227415 635.
# ... with 1,694 more rows
我正在嘗試提出一個一步的解決方案。 同時,這是一個簡單的兩步解決方案-
pop_1952 <- filter(gapminder, year == 1952) %>%
group_by(continent) %>%
summarise(tot_pop_1952 = sum(pop, na.rm = T))
gapminder %>%
group_by(year, continent) %>%
summarize(tot_pop = sum(as.numeric(pop))) %>%
left_join(pop_1952, by = "continent") %>%
mutate(
CHG_SINCE_1952 = (tot_pop - tot_pop_1952) / tot_pop_1952
) %>%
ggplot(aes(x = year, y = CHG_SINCE_1952, color = continent)) +
geom_line()
如果有幫助的話,這里是一個單鏈解決方案(我猜從技術上講還是兩個步驟)-
gapminder %>%
mutate(
tot_pop_1952 = ave(as.numeric(pop)*(year == 1952), continent, FUN = sum)
) %>%
group_by(year, continent) %>%
summarize(
tot_pop = sum(as.numeric(pop)),
tot_pop_1952 = mean(tot_pop_1952),
CHG_SINCE_1952 = (tot_pop - tot_pop_1952) / tot_pop_1952
) %>%
ggplot(aes(x = year, y = CHG_SINCE_1952, color = continent)) +
geom_line()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.