[英]Grouping in data.table: how to get more than 1 column of results?
I have a data.table
object like this one 我有一个像这样的
data.table
对象
library(data.table)
a <- structure(list(PERMNO = c(10006L, 10006L, 10015L, 10015L, 20000L, 20000L),
SHROUT = c(1427L, 1427L, 1000L, 1001L, 200L, 200L),
PRC = c(6.5, 6.125, 0.75, 0.5, 3, 4),
RET = c(0.005, -0.005, -0.001, 0.05, -0.002, 0.0031)),
.Names = c("PERMNO", "SHROUT", "PRC", "RET"),
class = c("data.table", "data.frame"), row.names = c(NA, -6L))
setkey(a,PERMNO)
and I need to perform a number of calculations by PERMNO
, but here in this example let's supposed they are only 2: 我需要通过
PERMNO
执行一些计算,但是在这个例子中我们假设它们只有2:
mktcap <- a[ , tail(SHROUT,n=1)*tail(PRC,n=1),by=PERMNO]
sqret <- a[, sum(RET^2),by=PERMNO]
which produce 哪个产生
> mktcap
PERMNO V1
[1,] 10006 8740.375
[2,] 10015 500.500
[3,] 20000 800.000
> sqret
PERMNO V1
[1,] 10006 5.000e-05
[2,] 10015 2.501e-03
[3,] 20000 1.361e-05
I would like to combine the two functions into one, to produce a matrix (or data.table, data.frame, whatever) with 3 columns, the first with the PERMNO
s, the second with mktcap
and the third with sqrt
. 我想把这两个函数组合成一个,用3列产生矩阵(或data.table,data.frame等),第一个用
PERMNO
,第二个用mktcap
,第三个用sqrt
。
The problem is that this grouping function (ie variable[ , function(), by= ]
) seems to only produce results with two columns, one with the keys and one with results. 问题是这个分组函数(即
variable[ , function(), by= ]
)似乎只产生两列的结果,一列有键,另一列有结果。
This is my attempt (one of many) to produce what I want: 这是我的尝试(许多之一)来产生我想要的东西:
comb.fun <- function(datai) {
mktcap <- as.matrix(tail(datai[,1],n=1)*tail(datai[,2],n=1),ncol=1)
sqret <- as.matrix(sum(datai[,3]^2),ncol=1)
return(c(mktcap,sqret))
}
myresults <- a[, comb.fun(cbind(SHROUT,PRC,RET)), by=PERMNO]
which produces 哪个产生
PERMNO V1
[1,] 10006 8.740375e+03
[2,] 10006 5.000000e-05
[3,] 10015 5.005000e+02
[4,] 10015 2.501000e-03
[5,] 20000 8.000000e+02
[6,] 20000 1.361000e-05
(the results are all there, but they were forced into one column). (结果都在那里,但他们被迫进入一个专栏)。 No matter what I try, I cannot get grouping to return a matrix with more than two columns (or more than one column of results).
无论我尝试什么,我都无法分组返回一个包含两列以上(或多列结果)的矩阵。
Is it possible to get two or more column of results with grouping in data.table
? 是否可以通过
data.table
分组获得两列或更多列结果?
The answer (using list()
to collect the several desired summary stats) is there in the excellent Examples section of the ?data.table
help file. 答案(使用
list()
来收集几个所需的摘要统计信息)存在于?data.table
帮助文件的优秀示例部分中。 (It's about 20 lines up from the bottom). (距离底部大约20行)。
out <- a[ , list(mktcap = tail(SHROUT,n=1)*tail(PRC,n=1),
sqret = sum(RET^2)),
by=PERMNO]
out
# PERMNO mktcap sqret
# 1: 10006 8740.375 5.000e-05
# 2: 10015 500.500 2.501e-03
# 3: 20000 800.000 1.361e-05
Edit: 编辑:
In the comments below, Matthew Dowle describes a simple way to clean up code in which the j
argument in calls like x[i,j,by]
is getting awkwardly long. 在下面的评论中,Matthew Dowle描述了一种清理代码的简单方法,其中
x[i,j,by]
等调用中的j
参数变得越来越长。
Implementing his suggestion on the call above, you could instead do: 在上面的电话中实施他的建议,你可以改为:
## 1) Use quote() to make an expression object out of the statement passed to j
mm <- quote(list(mktcap = tail(SHROUT,n=1)*tail(PRC,n=1),
sqret = sum(RET^2)))
## 2) Use eval() to evaluate it as if it had been typed directly in the call
a[ , eval(mm), by=PERMNO]
# PERMNO mktcap sqret
# 1: 10006 8740.375 5.000e-05
# 2: 10015 500.500 2.501e-03
# 3: 20000 800.000 1.361e-05
how about 怎么样
comb.fun <- function(a) {
mktcap <- a[ , tail(SHROUT,n=1)*tail(PRC,n=1),by=PERMNO]
sqret <- a[, sum(RET^2),by=PERMNO]
return(merge(mktcap,sqret))
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.