简体   繁体   English

R - 分组数据,但将不同的功能应用于不同的列

[英]R - Group data but apply different functions to different columns

I'd like to group this data but apply different functions to some columns when grouping. 我想对这些数据进行分组,但在分组时将不同的功能应用于某些列。

ID  type isDesc isImage
1   1    1      0
1   1    0      1
1   1    0      1
4   2    0      1
4   2    1      0
6   1    1      0
6   1    0      1
6   1    0      0

I want to group by ID , columns isDesc and isImage can be summed, but I would like to get the value of type as it is. 我想按ID分组,列isDescisImage可以求和,但我想得到类型的值。 type will be the same through the whole dataset. type将在整个数据集中相同。 The result should look like this: 结果应如下所示:

ID  type isDesc isImage
1   1    1      2
4   2    1      1
6   1    1      1

Currently I am using 目前我正在使用

library(plyr)
summarized = ddply(data, .(ID), numcolwise(sum))

but it simply sums up all the columns. 但它简单地总结了所有列。 You don't have to use ddply but if you think it's good for the job I'd like to stick to it. 你不必使用ddply但如果你认为这对我的工作有好处,我会坚持下去。 data.table library is also an alternative data.table库也是一种替代方案

Using data.table : 使用data.table

require(data.table)
dt <- data.table(data, key="ID")
dt[, list(type=type[1], isDesc=sum(isDesc), 
                  isImage=sum(isImage)), by=ID]

#    ID type isDesc isImage
# 1:  1    1      1       2
# 2:  4    2      1       1
# 3:  6    1      1       1

Using plyr : 使用plyr

ddply(data , .(ID), summarise, type=type[1], isDesc=sum(isDesc), isImage=sum(isImage))
#   ID type isDesc isImage
# 1  1    1      1       2
# 2  4    2      1       1
# 3  6    1      1       1

Edit: Using data.table 's .SDcols , you can do this in case you've too many columns that are to be summed, and other columns to be just taken the first value. 编辑:使用data.table.SDcols ,你可以这样做,以防你有太多的列需要求和,而其他列只是第一个值。

dt1 <- dt[, lapply(.SD, sum), by=ID, .SDcols=c(3,4)]
dt2 <- dt[, lapply(.SD, head, 1), by=ID, .SDcols=c(2)]
> dt2[dt1]
#    ID type isDesc isImage
# 1:  1    1      1       2
# 2:  4    2      1       1
# 3:  6    1      1       1

You can provide column names or column numbers as arguments to .SDcols. 您可以提供列名称或列号作为.SDcols的参数。 Ex: .SDcols=c("type") is also valid. 例如: .SDcols=c("type")也有效。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM