简体   繁体   English

R data.frame:按向量分组的选定列的行总和

[英]R data.frame: rowSums of selected columns by grouping vector

I have a data frame with a sequence of numeric columns, surrounded on both sides by (irrelevant) columns of characters. 我有一个带有一系列数字列的数据框,两边都被(无关)字符列包围。 I want to obtain a new data frame that keeps the position of the irrelevant columns, and adds the numeric columns to eachother by a certain grouping vector (or applies some other row-wise function to the data frame, by group). 我想获得一个新的数据框,该框保留不相关列的位置,并通过某个分组矢量将数字列彼此相加(或按组将其他一些行功能应用于数据框)。 Example: 例:

sample = data.frame(cha1 = c("A","B"),num1=1:2,num2=3:4,num3=11:12,num4=13:14,cha2=c("C","D"))
> sample
  cha1 num1 num2 num3 num4 cha2
1    A    1    3   11   13    C
2    B    2    4   12   14    D

with the goal to obtain 以获取目标

> goal
  cha1 X1 X2 cha2 
1    A  4 24    C
2    B  6 26    D

ie I've summed the 4 numeric columns according to the grouping vector gl(2,2,4) = (1,1,2,2) [levels: 1,2] 即我已经根据分组向量gl(2,2,4) = (1,1,2,2) [levels: 1,2]对4个数字列求和

For a purely numeric data frame I've found the following method: 对于纯数字数据框,我发现了以下方法:

sample_num = sample[,2:5] #select numeric columns
data.frame(t(apply(sample_num,1,function(row) tapply(row, INDEX=gl(2,2,4),sum))))

I could combine this with re-inserting the character columns to give the intended result, but I'm really looking for a more elegant way. 我可以将其与重新插入字符列结合起来以得到预期的结果,但是我确实在寻找一种更优雅的方法。 I'm particularly interested in a plyr method if there is one, as I'm trying to migrate to plyr for all my data frame manipulations. 我对plyr方法(如果有的话)特别感兴趣,因为我正尝试迁移到plyr进行所有数据帧操作。 I imagine the first step would be to cast the data frame into long format, but I have no idea how to proceed from there. 我想第一步是将数据帧转换为长格式,但是我不知道如何从那里开始。

One 'absolute' requirement is that I cannot do without the gl(n,k,l) method of grouping, as I need this to be applicable to a wide range of data frames and grouping factors. 一个“绝对”的要求是我不能没有gl(n,k,l)分组方法,因为我需要将此方法应用于广泛的数据帧和分组因子。

EDIT: for simplicity assume that I know which columns are the relevant numeric columns. 编辑:为简单起见,假设我知道哪些列是相关的数字列。 I'm not concerned with how to select them, I'm concerned with how to do my grouped sum without messing up the original data frame structure. 我不在乎如何选择它们,而是在不弄乱原始数据帧结构的情况下如何进行分组求和。

Thanks! 谢谢!

Grpindex<-gl(2,2,4)    
goal<-cbind.data.frame(sample["cha1"],(t(rowsum(t(sample[,2:5]), paste0("X",Grpindex)))),sample["cha2"])

Output: 输出:

  cha1 X1 X2 cha2
1    A  4 24    C
2    B  6 26    D

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM