简体   繁体   English

R:当使用data.table时,当我执行x [y]时如何获得y的列?

[英]R: When using data.table how do I get columns of y when I do x[y]?

UPDATE : Old question ... it was resolved by data.table v1.5.3 in Feb 2011. 更新 :旧问题……它已在2011年2月由data.table v1.5.3解决。

I am trying to use the data.table package, and really like the speedups I am getting, but I am stumped by this error when I do x[y, <expr>] where x and y are "data-tables" with the same key, and <expr> contains column names of both x and y : 我正在尝试使用data.table包,并且确实喜欢data.table ,但是当我用x[y, <expr>]执行x[y, <expr>]时,这个错误使我很data.table ,其中xy是“ data-tables”相同的键,并且<expr>包含xy列名:

require(data.table)
x <- data.table( foo = 1:5, a = 5:1 )
y <- data.table( foo = 1:5, boo = 10:14)
setkey(x, foo)
setkey(y, foo)
> x[y, foo*boo]
Error in eval(expr, envir, enclos) : object 'boo' not found

UPDATE... To clarify the functionality I am looking for in the above example: I need to do the equivalent of the following: 更新...为了澄清我在以上示例中寻找的功能:我需要执行以下等效操作:

with(merge(x,y), foo*boo)

However according to the below extract from the data.table FAQ, this should have worked: 但是,根据下面从data.table常见问题解答中摘录的data.table ,这应该有效:

Finally, although it appears as though x[y] does not return the columns in y, you can actually use the columns from y in the j expression. 最后,尽管看起来x [y]似乎没有返回y中的列,但实际上您可以在j表达式中使用y中的列。 This is what we mean by join inherited scope. 这就是联接继承范围的意思。 Why not just return the union of all the columns from x and y and then run expressions on that? 为什么不只返回x和y中所有列的并集,然后在其上运行表达式? It boils down to eciency of code and what is quicker to program. 它归结为代码的效率和更快的编程速度。 When you write x[y,foo boo], data.table automatically inspects the j expression to see which columns it uses. 当您编写x [y,foo boo]时,data.table会自动检查j表达式以查看其使用的列。 It will only subset, or group, those columns only. 它将仅对那些列进行子集或分组。 Memory is only created for the columns the j uses. 仅为j使用的列创建内存。 Let's say foo is in x, and boo is in y (along with 20 other columns in y). 假设foo在x中,而boo在y中(以及y中的其他20列)。 Isn't x[y,foo boo] quicker to program and quicker to run than a merge step followed by another subset step ? x [y,foo boo] 难道不是比合并步骤后跟另一个子集步骤更快的编程和运行吗?

I am aware of this question that addressed a similar issue, but it did not seem to have been resolved satisfactorily. 我知道这个问题解决了类似的问题,但似乎并未得到令人满意的解决。 Anyone know what I am missing or misunderstanding? 有人知道我的缺失或误解吗? Thanks. 谢谢。

UPDATE: I asked on the data-table help mailing list and the package author (Matthew Dowle) replied that indeed the FAQ quoted above is wrong, so the syntax I am using will not work currently, ie I cannot refer to the y columns in the j (ie second) argument when I do x[y,...] . 更新:我在数据表帮助邮件列表上询问,软件包作者(Matthew Dowle) 回答说 ,以上引述的FAQ确实是错误的,因此我正在使用的语法目前不起作用,即,我无法引用其中的y列。我做x[y,...]时的j (即第二个)参数。

I am not sure if I understand the problem well, and I also just started to read the docs of data.table library, but I think if you would like to get the columns of y and also do something to those by the columns of a , you might try something like: 我不确定我是否很好地理解了这个问题,并且我也刚开始阅读data.table库的文档,但是我认为您是否要获取y的列,并对a的列做些什么,您可以尝试类似:

> x[y,a*y]
     foo boo
[1,]   5  50
[2,]   8  44
[3,]   9  36
[4,]   8  26
[5,]   5  14

Here, you get back the columns of y multiplied by the a column of x . 在这里,您获得y的列乘以xa列。 If you want to get x 's foo multiplied by y 's boo , try: 如果要让xfoo乘以yboo ,请尝试:

> y[,x*boo]
     foo  a
[1,]  10 50
[2,]  22 44
[3,]  36 36
[4,]  52 26
[5,]  70 14

After editing: thank you @Prasad Chalasani making the question clearer for me. 编辑后:谢谢@Prasad Chalasani,让我更清楚了这个问题。

If simple merging is preferred, then the following should work. 如果首选简单合并,则应执行以下操作。 I made up a more complex data to see the actions deeper: 我整理了一个更复杂的数据,以更深入地了解操作:

x <- data.table( foo = 1:5, a=20:24, zoo = 5:1 )
y <- data.table( foo = 1:5, b=30:34, boo = 10:14)
setkey(x, foo)
setkey(y, foo)

So only an extra column was added to each data.table. 因此,仅在每个data.table中添加了一个额外的列。 Let us see merge and doing it with data.tables : 让我们看一下merge并使用data.tables

> system.time(merge(x,y))
   user  system elapsed 
  0.027   0.000   0.023 
> system.time(x[,list(y,x)])
   user  system elapsed 
  0.003   0.000   0.006 

From which the latter looks a lot faster. 从后者看来,速度要快得多。 The results are not identical though, but can be used in the same way (with an extra column of the latter run): 结果虽然不完全相同,但是可以以相同的方式使用(在后面的运行中有一个额外的列):

> merge(x,y)
     foo  a zoo  b boo
[1,]   1 20   5 30  10
[2,]   2 21   4 31  11
[3,]   3 22   3 32  12
[4,]   4 23   2 33  13
[5,]   5 24   1 34  14
> x[,list(x,y)]
     foo  a zoo foo.1  b boo
[1,]   1 20   5     1 30  10
[2,]   2 21   4     2 31  11
[3,]   3 22   3     3 32  12
[4,]   4 23   2     4 33  13
[5,]   5 24   1     5 34  14

So to get xy we might use: xy <- x[,list(x,y)] . 因此,要获取xy我们可以使用: xy <- x[,list(x,y)] To compute a one-column data.table from xy$foo * xy$boo , the following might work: 要从xy$foo * xy$boo计算单列data.table,可能需要执行以下操作:

> xy[,foo*boo]
[1] 10 22 36 52 70

Well, the result is not a data.table but a vector instead. 好吧,结果不是data.table而是一个vector。


Update (29/03/2012): thanks for @David for pointing my attention to the fact that merge.data.table were used in the above examples. 更新(29/03/2012):感谢@David指出我在上述示例中使用了merge.data.table的事实。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R:当 x 是列名而 y 是行时插入 data.table - R: Interpolate a data.table when x's are the names of columns and y's the rows 如何在data.table Y [X]中对两个名称不同的表进行左联接,并保持X的列名? - How can I do a left join in data.table, Y[X], for two tables with different names, and keep the column names for X? 如何使用两个或更多列中的数据与R data.table的比较来应用函数 - How do I apply a function using comparisons of data in two or more columns with R data.table 当 df$y == 1 时如何更改 df$x 中的 NA - How do I change NA in df$x when df$y == 1 在R中使用predict()函数绘制glm时,如何确保我的x和y长度没有差异? - How do I ensure that my x and y lengths don't differ when plotting a glm using the predict() function in R? 为什么在 R 中的 data.table 中使用“熔化”函数时会收到此错误消息“找不到函数”模式 - Why do I get this error message "could not find function "patterns" when using the "melt" function in data.table in R 如何进行X [Y] data.table连接,而不会丢失X上的现有主键? - How to do an X[Y] data.table join, without losing an existing main key on X? data.table错误I(x):= y和`:=`(I(x)= y)之间的差异 - data.table bug in difference between I(x) := y and `:=`(I(x) = y) 在 data.table 中熔化时如何不丢失所有未熔化的列? - How do I not lose all non-melted columns when melting in data.table? 在 data.table 中使用分组依据时,如何在列中进行布尔条件过滤? - How do I do boolean criteria filtering within a column when using grouping by in data.table?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM