[英]Sum by groups in two columns in R
我有以下 DF:
DAY BRAND SOLD
2018/04/10 KIA 10
2018/04/15 KIA 5
2018/05/01 KIA 7
2018/05/06 KIA 3
2018/04/04 BMW 2
2018/05/25 BMW 8
2018/06/19 BMW 5
2018/06/14 BMW 1
我想按月對銷售的單位進行求和,並在日期屬於該月的每一行中重復它們(不能在同一個月內為不同品牌計算總和,這是一個條件),如下所示:
DAY BRAND SOLD TOTAL
2018/04/10 KIA 10 15
2018/04/15 KIA 5 15
2018/05/01 KIA 7 10
2018/05/06 KIA 3 10
2018/04/04 BMW 2 2
2018/05/25 BMW 8 8
2018/06/19 BMW 5 6
2018/06/14 BMW 1 6
我怎樣才能做到這一點?
我們可以在從“DAY”列中提取“月份”后使用ave
並將其與“BRAND”一起用作分組變量
df1$TOTAL <- with(df1, ave(SOLD, BRAND,
format(as.Date(DAY, "%Y/%m/%d"), "%m"), FUN = sum))
df1$TOTAL
#[1] 15 15 10 10 2 8 6 6
或者在dplyr/lubridate
library(dplyr)
library(lubridate)
df1 %>%
group_by(BRAND, MONTH = month(ymd(DAY))) %>%
mutate(TOTAL = sum(SOLD))
# A tibble: 8 x 5
# Groups: BRAND, MONTH [5]
# DAY BRAND SOLD MONTH TOTAL
# <chr> <chr> <int> <dbl> <int>
#1 2018/04/10 KIA 10 4 15
#2 2018/04/15 KIA 5 4 15
#3 2018/05/01 KIA 7 5 10
#4 2018/05/06 KIA 3 5 10
#5 2018/04/04 BMW 2 4 2
#6 2018/05/25 BMW 8 5 8
#7 2018/06/19 BMW 5 6 6
#8 2018/06/14 BMW 1 6 6
如果需要,在使用select(-MONTH)
ungroup
select(-MONTH)
后刪除“MONTH”列
df1 <- structure(list(DAY = c("2018/04/10", "2018/04/15", "2018/05/01",
"2018/05/06", "2018/04/04", "2018/05/25", "2018/06/19", "2018/06/14"
), BRAND = c("KIA", "KIA", "KIA", "KIA", "BMW", "BMW", "BMW",
"BMW"), SOLD = c(10L, 5L, 7L, 3L, 2L, 8L, 5L, 1L)),
class = "data.frame", row.names = c(NA,
-8L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.