简体   繁体   English

如何在任意条件下通过外部 function 对 data.table 进行子集化

[英]How to subset data.table by external function with arbitrary conditions

Suppose I have a datatable like the following.假设我有一个如下所示的数据表。

a <- seq(2)
b <- seq(3)
c <- seq(4)
dt <- data.table(expand.grid(a,b,c))
> dt
    Var1 Var2 Var3
 1:    1    1    1
 2:    2    1    1
 3:    1    2    1
 4:    2    2    1
 5:    1    3    1
 6:    2    3    1
 7:    1    1    2
 8:    2    1    2
 9:    1    2    2
10:    2    2    2
11:    1    3    2
12:    2    3    2
13:    1    1    3
14:    2    1    3
15:    1    2    3
16:    2    2    3
17:    1    3    3
18:    2    3    3
19:    1    1    4
20:    2    1    4
21:    1    2    4
22:    2    2    4
23:    1    3    4
24:    2    3    4

now I can easily subset by column values by using a standard datatable subset call.现在我可以通过使用标准的数据表子集调用轻松地按列值进行子集化。 For example,例如,

dt[Var2==2 & Var3==1]
   Var1 Var2 Var3
1:    1    2    1
2:    2    2    1

But now suppose I wanted to create a function outside of the datatable, something, generically like但是现在假设我想在数据表之外创建一个 function ,一般来说像

foo <- function(dt,...){
return(dt[Var2==2 & Var3==1])}

I have seen some examples using only 1 subset column and globalenv()$val, and you could define Var2 outside of the data.table filter.我见过一些仅使用 1 个子集列和 globalenv()$val 的示例,您可以在 data.table 过滤器之外定义 Var2。

foo <- function(dt,...){
return(dt[,Var2==globalenv()$Var2])}

But, if I had a large number of columns and wanted to filter by an arbitrary subset of the columns and values, this wouldn't seem to present a simple solution.但是,如果我有大量列并且想要按列和值的任意子集进行过滤,这似乎不是一个简单的解决方案。 I can do this a few ways, but they all seem very cumbersome and inefficient.我可以通过几种方式做到这一点,但它们看起来都非常繁琐且效率低下。 Is there a way to subset by a function with arbitrary columns selected by the user that would accomplish this?有没有办法通过 function 和用户选择的任意列来实现这一点?

Like,喜欢,

foo(dt,Var2=1,Var3=1)
foo(dt,Var1=2,Var3=1,Var10=2,...)
foo(dt,c(Var1=2,Var3=1,Var10=2))

etc

I added the extra dots since I want to be able to enter any number of arbitrary selection conditions to the function call.我添加了额外的点,因为我希望能够在 function 调用中输入任意数量的任意选择条件。

In case anyone is wondering, my end goal is a much larger function.如果有人想知道,我的最终目标是更大的 function。 But the datatable filtering is a critical portion of it.但是数据表过滤是其中的关键部分。

A slight modification from Christian's answer:对克里斯蒂安的回答稍作修改:

fun <- function(dt, ...) {
    args <- list(...)
    filter <- Reduce(
        function(x, y) call("&", x, y),
        Map(function(val, name) call("==", as.name(name), val), args, names(args)))
    dt[eval(filter)]
}

fun(dt, Var1 = 1, Var3 = 1)
#   Var1 Var2 Var3
#1:    1    1    1
#2:    1    2    1
#3:    1    3    1

One possible solution (Note the of == and not = , as in the post):一种可能的解决方案(注意==而不是= ,如帖子中所示):

foo = function(dt, ...) {
  eval(substitute(dt[Reduce(`&`, list(...)),]))
}

foo(dt,Var2==1,Var3==1)

    Var1  Var2  Var3
   <int> <int> <int>
1:     1     1     1
2:     2     1     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM