簡體   English   中英

使用group_by時創建匯總值並匯總

[英]Create summary value when using group_by and summarize

我經常想顯示給定基准年的變化。 例如,自給定年份以來,發生了什么變化(百分比)? gapminder數據集提供了一個很好的示例:

人口變化

要開始得到答案,你會group_by一年,大陸, summarize和的人口。 但是,您如何獲得一個匯總值,即1952年人口呢?

library(gapminder)
gapminder %>%
  group_by(year, continent) %>%
  summarize(tot_pop = sum(as.numeric(pop)),
            SUMMARY_VAL = POP_SUM_1952,
            CHG_SINCE_1952 = (tot_pop - SUMMARY_VAL ) / SUMMARY_VAL ) %>%
  ggplot(aes(x = year, y = CHG_SINCE_1952, color = continent)) +
  geom_line()

僅供參考,gapminder看起來像這樣:

# A tibble: 1,704 x 6
   country     continent  year lifeExp      pop gdpPercap
   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
 1 Afghanistan Asia       1952    28.8  8425333      779.
 2 Afghanistan Asia       1957    30.3  9240934      821.
 3 Afghanistan Asia       1962    32.0 10267083      853.
 4 Afghanistan Asia       1967    34.0 11537966      836.
 5 Afghanistan Asia       1972    36.1 13079460      740.
 6 Afghanistan Asia       1977    38.4 14880372      786.
 7 Afghanistan Asia       1982    39.9 12881816      978.
 8 Afghanistan Asia       1987    40.8 13867957      852.
 9 Afghanistan Asia       1992    41.7 16317921      649.
10 Afghanistan Asia       1997    41.8 22227415      635.
# ... with 1,694 more rows

我正在嘗試提出一個一步的解決方案。 同時,這是一個簡單的兩步解決方案-

pop_1952 <- filter(gapminder, year == 1952) %>%
  group_by(continent) %>%
  summarise(tot_pop_1952 = sum(pop, na.rm = T))

gapminder %>%
  group_by(year, continent) %>%
  summarize(tot_pop = sum(as.numeric(pop))) %>%
  left_join(pop_1952, by = "continent") %>%
  mutate(
    CHG_SINCE_1952 = (tot_pop - tot_pop_1952) / tot_pop_1952
  ) %>%
  ggplot(aes(x = year, y = CHG_SINCE_1952, color = continent)) +
  geom_line()

如果有幫助的話,這里是一個單鏈解決方案(我猜從技術上講還是兩個步驟)-

gapminder %>%
  mutate(
    tot_pop_1952 = ave(as.numeric(pop)*(year == 1952), continent, FUN = sum)
  ) %>%
  group_by(year, continent) %>%
  summarize(
    tot_pop = sum(as.numeric(pop)),
    tot_pop_1952 = mean(tot_pop_1952),
    CHG_SINCE_1952 = (tot_pop - tot_pop_1952) / tot_pop_1952
  ) %>%
  ggplot(aes(x = year, y = CHG_SINCE_1952, color = continent)) +
  geom_line()

dplyr一步解決方案。

  gapminder %>%
    group_by(year, continent) %>%
    summarize(tot_pop = sum(as.numeric(pop))) %>%
    ungroup() %>% 
    mutate(CHG_POP = tot_pop - tot_pop[year == 1952]) %>% 
    ggplot(aes(x = year, y = tot_pop, color = continent)) +
    geom_line()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM