[英]Using by with data.table
Here I am trying to use the by parameter in the data.table to rank the prediction column within each group. 在这里,我尝试使用data.table中的by参数对每个组中的预测列进行排名。 I haven't been able to understand why the following piece of code isn't working:
我无法理解为什么以下代码无法正常工作:
> x.small
prediction group
1: -0.0093753015 up
2: 0.0204832283 down
3: -0.0091790179 down
4: -0.0473988803 down
5: 0.0144955868 down
6: -0.0139455871 down
7: 0.0005746896 up
8: -0.0174406693 down
9: -0.0180556244 down
10: -0.0343069464 up
> x.small[, rank(prediction), by=group]
Error in rank(prediction) :
'names' attribute [7] must be the same length as the vector [3]
But this example code works fine: 但是此示例代码可以正常工作:
> diamonds.dt <- data.table(diamonds[1:10, c('carat', 'color')])
> diamonds.dt
carat color
1: 0.23 E
2: 0.21 E
3: 0.23 E
4: 0.29 I
5: 0.31 J
6: 0.24 J
7: 0.24 I
8: 0.26 H
9: 0.22 E
10: 0.23 H
> diamonds.dt[, rank(carat), by=color]
color V1
1: E 3.5
2: E 1.0
3: E 3.5
4: E 2.0
5: I 2.0
6: I 1.0
7: J 2.0
8: J 1.0
9: H 2.0
10: H 1.0
Any help would be much appreciated! 任何帮助将非常感激!
EDIT: 编辑:
Okay now I really have no idea what's going on, this is very bizarre. 好吧,现在我真的不知道发生了什么,这很奇怪。 I tried making a reproducible example for @Ananda but could not recreate the error.
我尝试为@Ananda制作一个可复制的示例,但无法重新创建该错误。 I even tried running the ranking logic on an exact copy of the prediction column and it worked fine:
我什至尝试在预测列的精确副本上运行排名逻辑,但效果很好:
> x.small[, prediction.copy:=prediction]
> x.small[, rank(prediction.copy), by=group]
group V1
1: up 2
2: up 3
3: up 1
4: down 7
5: down 5
6: down 1
7: down 6
8: down 4
9: down 3
10: down 2
> x.small[, rank(prediction), by=group]
Error in rank(prediction) :
'names' attribute [7] must be the same length as the vector [3]
How could there be two different results from two identical columns? 两个相同的列怎么会有两个不同的结果?
EDIT 2: 编辑2:
Output of dput(x.small): dput(x.small)的输出:
> dput(x.small)
structure(list(prediction = structure(c(-0.00937530151309606,
0.0204832283018108, -0.00917901792827827, -0.0473988802836657,
0.0144955868466372, -0.0139455871394683, 0.000574689607249577,
-0.0174406692627376, -0.0180556244204637, -0.0343069463869563
), .Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"
)), group = c("up", "down", "down", "down", "down", "down", "up",
"down", "down", "up"), prediction.copy = c(-0.00937530151309606,
0.0204832283018108, -0.00917901792827827, -0.0473988802836657,
0.0144955868466372, -0.0139455871394683, 0.000574689607249577,
-0.0174406692627376, -0.0180556244204637, -0.0343069463869563
)), .Names = c("prediction", "group", "prediction.copy"), row.names = c(NA,
-10L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x22f2af8>)
I guess I'll just close this one. 我想我将关闭这个。 If you are having this same issue, check if the problem column is a named vector by running str(x.small) and seeing if the vector starts with the word "Named".
如果您遇到相同的问题,请通过运行str(x.small)并查看向量是否以单词“ Named”开头来检查问题列是否为命名向量。 For some reason using the by parameter when operating on a named vector is causing issues.
由于某些原因,在对命名向量进行操作时使用by参数会导致问题。 This appears to be a minor bug in earlier versions of data.table that was patched in later versions.
这似乎是data.table早期版本中的一个小错误,该错误已在更高版本中进行了修补。 To fix it, upgrade data.table or just use
unname()
as @Frank suggested: 要解决此问题,请升级data.table或按照@Frank的建议使用
unname()
:
x.small[,rank(unname(prediction)), by=group]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.