简体   繁体   English

R data.table:如何根据条件按组对变量求和?

[英]R data.table: How to sum variables by group based on a condition?

Let's say I have the following R data.table (though I'm happy to work with base R, data.frame as well) 假设我有以下R data.table (尽管我很高兴也可以使用base R data.frame)

library(data.table)

dt = data.table(Category=c("First","First","First","Second","Third", "Third", "Second"), Frequency=c(10,15,5,2,14,20,3), times = c(0, 0, 0, 3, 3, 1))

> dt
   Category Frequency times
1:    First        10     0
2:    First        15     0
3:    First         5     0
4:   Second         2     3
5:    Third        14     3
6:    Third        20     1
7:   Second         3     0

If I wished to sum the Frequencies by Category, I would use the following: 如果我希望按类别对频率进行汇总,则可以使用以下内容:

data[, sum(Frequency), by = Category]

However, let's say I wanted to sum Frequency by Category if and only if times is non-zero and not equal to NA ? 但是,假设我想仅当times非零且不等于NA时,才按CategoryFrequency求和。

How would one make this sum a conditional based on the values of a separate column? 如何将这一总和作为条件基于另一列的值?

EDIT: apologies for the obvious question. 编辑:道歉的明显问题。 A quick addition: what about if the elements of a certain column are strings? 快速补充:如果某个列的元素是字符串怎么办?

eg 例如

> dt
   Category Frequency times
1:    First        ten    0
2:    First        ten    0
3:    First        five   0
4:   Second        five   3
5:    Third        five   3
6:    Third        five   1
7:   Second        ten    0

Sum() will not calculate the frequencies of ten versus five Sum()不会计算tenfive的频率

Remember the logic of data.table : dt[i, j, by] , that is take dt , subset rows using i , then calculate j grouped by by . 记住data.table的逻辑: dt[i, j, by] ,即使用i来获取dt子集行,然后计算by分组的j

dt[times != 0 & !is.na(times), sum(Frequency), by = Category]
   Category V1
1:   Second  2
2:    Third 34

您可以使用方括号子设置仅选择具有非零值和非NA值的times ,然后运行分组操作。

dt[which(dt$times > 0)][, sum(Frequency), by = Category]

You can use rowsum() for this. 您可以为此使用rowsum()。

rowsum 行数

Give Column Sums of a Matrix or Data Frame, Based on a Grouping Variable 根据分组变量给出矩阵或数据框的列求和

Compute column sums across rows of a numeric matrix-like object for each level of a grouping variable. 对于分组变量的每个级别,在类似数字矩阵的对象的行中计算列总和。 rowsum is generic, with a method for data frames and a default method for vectors and matrices. rowum是通用的,具有用于数据帧的方法以及用于向量和矩阵的默认方法。

Keywords: manip 关键字:manip

Usage 用法

rowsum(x, group, reorder = TRUE, …)

S3 method for data.frame 用于data.frame的S3方法

rowsum(x, group, reorder = TRUE, na.rm = FALSE, …)

S3 method for default 默认为S3方法

rowsum(x, group, reorder = TRUE, na.rm = FALSE, …)

Arguments a matrix, data frame or vector of numeric data. 讨论数字数据的矩阵,数据框或向量。 Missing values are allowed. 允许缺少值。 A numeric vector will be treated as a column vector. 数值向量将被视为列向量。 group

a vector or factor giving the grouping, with one element per row of x. Missing values will be treated as another group and a warning will be given.

reorder 重新排序

if TRUE, then the result will be in order of sort(unique(group)), if FALSE, it will be in the order that groups were encountered.

na.rm 自然

logical (TRUE or FALSE). Should NA (including NaN) values be discarded?

other arguments to be passed to or from methods

Details 细节

The default is to reorder the rows to agree with tapply as in the example below. 默认设置是对行进行重新排序以使其与Tapply一致,如下例所示。 Reordering should not add noticeably to the time except when there are very many distinct values of group and x has few columns. 重新排序不应明显增加时间,除非group的值非常多且x的列很少。

The original function was written by Terry Therneau, but this is a new implementation using hashing that is much faster for large matrices. 原始函数是由Terry Therneau编写的,但这是使用散列的新实现,对于大型矩阵而言,此实现要快得多。

To sum over all the rows of a matrix (ie, a single group) use colSums, which should be even faster. 要汇总矩阵的所有行(即,单个组),请使用colSums,它应该更快。

For integer arguments, over/underflow in forming the sum results in NA. 对于整数参数,形成总和的上溢/下溢导致NA。

Value

A matrix or data frame containing the sums. 包含和的矩阵或数据帧。 There will be one row per unique value of 的每个唯一值将有一行

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM