[英]Group by two column and summarize multiple columns
我有一個數據框,我想按“狀態”和“日期”列進行分組,然后像這樣總結其他列的值。
df
State Female Male Date
------------------------------
Texas 2 2 01/01/04
Texas 3 1 01/01/04
Texas 5 4 02/01/04
Cali 1 1 05/06/05
Cali 2 1 05/06/05
Cali 3 1 10/06/05
Cali 1 2 10/06/05
NY 10 5 11/06/05
NY 11 6 12/06/05
預期結果
df
State Female Male Date
------------------------------
Texas 5 3 01/01/04
Texas 5 4 02/01/04
Cali 3 2 05/06/05
Cali 4 3 10/06/05
NY 10 5 11/06/05
NY 11 6 12/06/05
我嘗試使用 group by 並進行匯總,但我不知道我對 2 列執行相同操作的方式
我的嘗試
df <- df_homicides %>%
group_by(state) %>%
summarise(Female = sum(Female))
``
Thanks for your help!
我們可以across
dplyr
版本> = 1.00
across
使用summarise
library(dplyr)
df %>%
group_by(State, Date) %>%
summarise(across(everything(), sum, na.rm = TRUE), .groups = 'drop')
# A tibble: 6 x 4
# State Date Female Male
# <chr> <chr> <int> <int>
#1 Cali 05/06/2005 3 2
#2 Cali 10/06/2005 4 3
#3 NY 11/06/2005 10 5
#4 NY 12/06/2005 11 6
#5 Texas 01/01/2004 5 3
#6 Texas 02/01/2004 5 4
或者使用來自base R
aggregate
aggregate(.~ State + Date, df, sum, na.rm = TRUE)
df <- structure(list(State = c("Texas", "Texas", "Texas", "Cali", "Cali",
"Cali", "Cali", "NY", "NY"), Female = c(2L, 3L, 5L, 1L, 2L, 3L,
1L, 10L, 11L), Male = c(2L, 1L, 4L, 1L, 1L, 1L, 2L, 5L, 6L),
Date = c("01/01/2004", "01/01/2004", "02/01/2004", "05/06/2005",
"05/06/2005", "10/06/2005", "10/06/2005", "11/06/2005", "12/06/2005"
)), class = "data.frame", row.names = c(NA, -9L))
嘗試這個。 您可以使用summarise_all()
將多個變量與所需的函數聚合在一起。 這里的代碼:
library(dplyr)
#Code
df %>% group_by(State,Date) %>%
summarise_all(.funs = sum,na.rm=T)
輸出:
# A tibble: 6 x 4
# Groups: State [3]
State Date Female Male
<chr> <chr> <int> <int>
1 Cali 05/06/2005 3 2
2 Cali 10/06/2005 4 3
3 NY 11/06/2005 10 5
4 NY 12/06/2005 11 6
5 Texas 01/01/2004 5 3
6 Texas 02/01/2004 5 4
使用的一些數據:
#Data
df <- structure(list(State = c("Texas", "Texas", "Texas", "Cali", "Cali",
"Cali", "Cali", "NY", "NY"), Female = c(2L, 3L, 5L, 1L, 2L, 3L,
1L, 10L, 11L), Male = c(2L, 1L, 4L, 1L, 1L, 1L, 2L, 5L, 6L),
Date = c("01/01/2004", "01/01/2004", "02/01/2004", "05/06/2005",
"05/06/2005", "10/06/2005", "10/06/2005", "11/06/2005", "12/06/2005"
)), class = "data.frame", row.names = c(NA, -9L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.