简体   繁体   中英

Conditional sum across multiple columns using dplyr?

I have a dataset like the following but with about 100 columns of different animals

location <- c("A","A","A","A","B","B","C","C","D", "D","D")
season <- c("2", "2", "3", "4","2","3","1","2","2","4","4")
cat <- c(1,1,1,1,0,1,1,1,0,1,0)
dog <- c(0,0,1,1,1,1,0,1,0,1,1)

df <- data.frame(location, season,cat, dog)

     location season  cat  dog
1          A       2    1    0
2          A       2    1    0
3          A       3    1    1
4          A       4    1    1
5          B       2    0    1
6          B       3    1    1
7          C       1    1    0
8          C       2    1    1
9          D       2    0    0
10         D       4    1    1
11         D       4    0    1

I am attempting to sum all the animal columns based on the location and season, but I want a species column and its corresponding total column for each unique combination of location and season. Not all animal columns have a 1 value for every combination of location and season and they all have different names(ie different animals). I would like to delete any row of location and season with a species total = 0

Something like this:

    location season species n
1         A       2    cat  2  
2         A       3    cat  1
3         A       4    cat  1
4         B       3    cat  1
5         C       1    cat  1
6         C       2    cat  1
7         D       4    cat  1
8         A       3    dog  1
9         A       4    dog  1
10        B       2    dog  1
11        B       3    dog  1
12        C       2    dog  1
13        D       4    dog  2

I think dplyr is the way to go but I can't seem to get it right. Thank you!

df %>% group_by(location, season) %>%
  summarise(across(c(cat, dog), ~sum(.))) %>%
  pivot_longer(cols = c(cat, dog), names_to = "species", values_to = "n") %>%
  arrange(species, location, season) %>%
  filter(n != 0)

# A tibble: 13 x 4
# Groups:   location [4]
   location season species     n
   <chr>    <chr>  <chr>   <dbl>
 1 A        2      cat         2
 2 A        3      cat         1
 3 A        4      cat         1
 4 B        3      cat         1
 5 C        1      cat         1
 6 C        2      cat         1
 7 D        4      cat         1
 8 A        3      dog         1
 9 A        4      dog         1
10 B        2      dog         1
11 B        3      dog         1
12 C        2      dog         1
13 D        4      dog         2

Get the data in long format, for each location , season and Species sum the values and remove rows which have 0 values.

library(dplyr)

df %>%
  tidyr::pivot_longer(cols = cat:dog, names_to = 'Species') %>%
  group_by(location, season, Species) %>%
  summarise(value = sum(value)) %>%
  ungroup %>%
  filter(value > 0)

#  location season Species value
#   <chr>    <chr>  <chr>   <dbl>
# 1 A        2      cat         2
# 2 A        3      cat         1
# 3 A        3      dog         1
# 4 A        4      cat         1
# 5 A        4      dog         1
# 6 B        2      dog         1
# 7 B        3      cat         1
# 8 B        3      dog         1
# 9 C        1      cat         1
#10 C        2      cat         1
#11 C        2      dog         1
#12 D        4      cat         1
#13 D        4      dog         2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM