[英]Conditional sum across multiple columns using dplyr?
我有一个如下所示的数据集,但有大约 100 列不同的动物
location <- c("A","A","A","A","B","B","C","C","D", "D","D")
season <- c("2", "2", "3", "4","2","3","1","2","2","4","4")
cat <- c(1,1,1,1,0,1,1,1,0,1,0)
dog <- c(0,0,1,1,1,1,0,1,0,1,1)
df <- data.frame(location, season,cat, dog)
location season cat dog
1 A 2 1 0
2 A 2 1 0
3 A 3 1 1
4 A 4 1 1
5 B 2 0 1
6 B 3 1 1
7 C 1 1 0
8 C 2 1 1
9 D 2 0 0
10 D 4 1 1
11 D 4 0 1
我正在尝试根据位置和季节对所有动物列求和,但我想要一个物种列及其对应的总列,用于位置和季节的每个独特组合。 并非所有动物列对于位置和季节的每种组合都有 1 值,并且它们都有不同的名称(即不同的动物)。 我想删除物种总数 = 0 的任何位置和季节行
像这样的东西:
location season species n
1 A 2 cat 2
2 A 3 cat 1
3 A 4 cat 1
4 B 3 cat 1
5 C 1 cat 1
6 C 2 cat 1
7 D 4 cat 1
8 A 3 dog 1
9 A 4 dog 1
10 B 2 dog 1
11 B 3 dog 1
12 C 2 dog 1
13 D 4 dog 2
我认为 dplyr 是通往 go 的方式,但我似乎无法正确理解。 谢谢!
df %>% group_by(location, season) %>%
summarise(across(c(cat, dog), ~sum(.))) %>%
pivot_longer(cols = c(cat, dog), names_to = "species", values_to = "n") %>%
arrange(species, location, season) %>%
filter(n != 0)
# A tibble: 13 x 4
# Groups: location [4]
location season species n
<chr> <chr> <chr> <dbl>
1 A 2 cat 2
2 A 3 cat 1
3 A 4 cat 1
4 B 3 cat 1
5 C 1 cat 1
6 C 2 cat 1
7 D 4 cat 1
8 A 3 dog 1
9 A 4 dog 1
10 B 2 dog 1
11 B 3 dog 1
12 C 2 dog 1
13 D 4 dog 2
获取长格式的数据,对于每个location
、 season
和Species
sum
值并删除具有 0 值的行。
library(dplyr)
df %>%
tidyr::pivot_longer(cols = cat:dog, names_to = 'Species') %>%
group_by(location, season, Species) %>%
summarise(value = sum(value)) %>%
ungroup %>%
filter(value > 0)
# location season Species value
# <chr> <chr> <chr> <dbl>
# 1 A 2 cat 2
# 2 A 3 cat 1
# 3 A 3 dog 1
# 4 A 4 cat 1
# 5 A 4 dog 1
# 6 B 2 dog 1
# 7 B 3 cat 1
# 8 B 3 dog 1
# 9 C 1 cat 1
#10 C 2 cat 1
#11 C 2 dog 1
#12 D 4 cat 1
#13 D 4 dog 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.