简体   繁体   English

根据R中同一行的另一列中的单元格的值对一列求和

[英]Sum a column based on the value of a cell in another column of the same row in R

I have this data frame:我有这个数据框:

names <- c("george","fred","bill","george",'fred',"bill")
val1  <- c(2,3,4,6,7,8)
val2  <- c(3,4,5,6,8,7)
ch    <- c("yes","no","yes","no","yes","no")
tot   <- data.frame(names,val1,val2,ch)


names val1 val2  ch
1 george    2    3 yes
2   fred    3    4  no
3   bill    4    5 yes
4 george    6    6  no
5   fred    7    8 yes
6   bill    8    7  no

And I want to sum the val1 and val2 for every names when the ch value is yes to have a new data frame like this:ch值为yes时,我想对每个namesval1val2求和,以获得这样的新数据框:

names val1 val2
1 george    2    3
2   fred    7    8
3   bill    4    5

We can either do a group by 'names' and the do the == within summarise_at to get the sum of 'val' columns that corresponds to 'ch' as 'yes'我们可以按“名称”进行分组,并在summarise_at执行==以获取与“ch”对应的“val”列的sum为“yes”

library(dplyr)
tot %>%
    group_by(names) %>%
    summarise_at(vars(starts_with('val')), ~ sum(.[ch == 'yes']))

Or filter the 'ch' first, but this could result in removing some 'names' that doesn't have the 'yes', so a complete at the end would be better或者先filter “ch”,但这可能会导致删除一些没有“yes”的“名称”,所以最后一个complete的会更好

library(tidyr)
tot %>%
    filter(ch == 'yes') %>%
    group_by(names) %>%
    summarise_at(vars(starts_with('val')), sum) %>%
    complete(names = unique(tot$names))

Alternatively to the use of tidyverse package, you can use base r function aggregate such as:除了使用tidyverse包,您还可以使用base r函数aggregate例如:

aggregate(tot[ch == "yes",2:3], by = list(tot[ch=="yes","names"]), sum)

  Group.1 val1 val2
1    bill    4    5
2    fred    7    8
3  george    2    3

Thanks to @akrun's suggestion, we can use aggregate and its argument subset to avoid double subsetting:感谢@akrun 的建议,我们可以使用aggregate及其参数subset来避免双重子集:

aggregate(. ~ names, tot, FUN = sum, subset= c(ch == 'yes'))
# or
aggregate(.~names, subset(tot, ch == "yes"), sum)

   names val1 val2 ch
1   bill    4    5  2
2   fred    7    8  2
3 george    2    3  2

This should be quite fast:这应该很快:

inds <- tot$ch=="yes"
rowsum(tot[inds, c("val1", "val2")], tot$names[inds])

       val1 val2
bill      4    5
fred      7    8
george    2    3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM