简体   繁体   English

在data.table中分组:如何获得超过1列的结果?

[英]Grouping in data.table: how to get more than 1 column of results?

I have a data.table object like this one 我有一个像这样的data.table对象

library(data.table)

a <- structure(list(PERMNO = c(10006L, 10006L, 10015L, 10015L, 20000L, 20000L), 
                    SHROUT = c(1427L, 1427L, 1000L, 1001L, 200L, 200L), 
                    PRC = c(6.5, 6.125, 0.75, 0.5, 3, 4), 
                    RET = c(0.005, -0.005, -0.001, 0.05, -0.002, 0.0031)),
                   .Names = c("PERMNO", "SHROUT", "PRC", "RET"), 
               class = c("data.table", "data.frame"), row.names = c(NA, -6L))

setkey(a,PERMNO)

and I need to perform a number of calculations by PERMNO , but here in this example let's supposed they are only 2: 我需要通过PERMNO执行一些计算,但是在这个例子中我们假设它们只有2:

mktcap <- a[ , tail(SHROUT,n=1)*tail(PRC,n=1),by=PERMNO]
sqret <- a[, sum(RET^2),by=PERMNO]

which produce 哪个产生

> mktcap
     PERMNO       V1
[1,]  10006 8740.375
[2,]  10015  500.500
[3,]  20000  800.000

> sqret
     PERMNO        V1
[1,]  10006 5.000e-05
[2,]  10015 2.501e-03
[3,]  20000 1.361e-05

I would like to combine the two functions into one, to produce a matrix (or data.table, data.frame, whatever) with 3 columns, the first with the PERMNO s, the second with mktcap and the third with sqrt . 我想把这两个函数组合成一个,用3列产生矩阵(或data.table,data.frame等),第一个用PERMNO ,第二个用mktcap ,第三个用sqrt

The problem is that this grouping function (ie variable[ , function(), by= ] ) seems to only produce results with two columns, one with the keys and one with results. 问题是这个分组函数(即variable[ , function(), by= ] )似乎只产生两列的结果,一列有键,另一列有结果。

This is my attempt (one of many) to produce what I want: 这是我的尝试(许多之一)来产生我想要的东西:

comb.fun <- function(datai) {
     mktcap <- as.matrix(tail(datai[,1],n=1)*tail(datai[,2],n=1),ncol=1)
     sqret <- as.matrix(sum(datai[,3]^2),ncol=1)
     return(c(mktcap,sqret))
}   

myresults <- a[, comb.fun(cbind(SHROUT,PRC,RET)), by=PERMNO]

which produces 哪个产生

     PERMNO           V1
[1,]  10006 8.740375e+03
[2,]  10006 5.000000e-05
[3,]  10015 5.005000e+02
[4,]  10015 2.501000e-03
[5,]  20000 8.000000e+02
[6,]  20000 1.361000e-05

(the results are all there, but they were forced into one column). (结果都在那里,但他们被迫进入一个专栏)。 No matter what I try, I cannot get grouping to return a matrix with more than two columns (or more than one column of results). 无论我尝试什么,我都无法分组返回一个包含两列以上(或多列结果)的矩阵。

Is it possible to get two or more column of results with grouping in data.table ? 是否可以通过data.table分组获得两列或更多列结果?

The answer (using list() to collect the several desired summary stats) is there in the excellent Examples section of the ?data.table help file. 答案(使用list()来收集几个所需的摘要统计信息)存在于?data.table帮助文件的优秀示例部分中。 (It's about 20 lines up from the bottom). (距离底部大约20行)。

out <- a[ , list(mktcap = tail(SHROUT,n=1)*tail(PRC,n=1),
                 sqret  = sum(RET^2)),
         by=PERMNO]

out
#    PERMNO   mktcap     sqret
# 1:  10006 8740.375 5.000e-05
# 2:  10015  500.500 2.501e-03
# 3:  20000  800.000 1.361e-05

Edit: 编辑:

In the comments below, Matthew Dowle describes a simple way to clean up code in which the j argument in calls like x[i,j,by] is getting awkwardly long. 在下面的评论中,Matthew Dowle描述了一种清理代码的简单方法,其中x[i,j,by]等调用中的j参数变得越来越长。

Implementing his suggestion on the call above, you could instead do: 在上面的电话中实施他的建议,你可以改为:

## 1) Use quote() to make an expression object out of the statement passed to j
mm <- quote(list(mktcap = tail(SHROUT,n=1)*tail(PRC,n=1),
                 sqret  = sum(RET^2)))

## 2) Use eval() to evaluate it as if it had been typed directly in the call
a[ , eval(mm), by=PERMNO]
#    PERMNO   mktcap     sqret
# 1:  10006 8740.375 5.000e-05
# 2:  10015  500.500 2.501e-03
# 3:  20000  800.000 1.361e-05

how about 怎么样

comb.fun <- function(a) {
 mktcap <- a[ , tail(SHROUT,n=1)*tail(PRC,n=1),by=PERMNO]
 sqret <- a[, sum(RET^2),by=PERMNO]

 return(merge(mktcap,sqret))
} 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM