如何从R表达式提取变量以在data.frame上下文中求值

Question

I have expressions in character that are supposed to be evaluated in a data.table (not important just context). 我有一些character表达式，应该在data.table评估（不重要，只是上下文）。 To make sure all the required columns are present I would like to extract the said columns within the R expression. 为了确保所有必需的列都存在，我想在R表达式中提取所述列。

What I want: 我想要的是：

library(data.table)
DT <- data.table(p001=rnorm(10),p002=rnorm(10),p003=rnorm(10))
expr <- 'p001+mean(p001,na.rm=TRUE)-weighted.mean(p002,w=p003)+someRandomOtherColumn'

# DT[,test:=p001+mean(p001,na.rm=TRUE)-weighted.mean(p002,w=p003)+someRandomOtherColumn]
# would fail as p004 is not in the columns

Basically I am looking for a way (probably a regex) that would extract from expr p001,p002,p003,someRandomOtherColumn . 基本上，我正在寻找一种将从expr p001,p002,p003,someRandomOtherColumn提取的方法（可能是正则表达式）。

My view on it: The way I see it I should be able to capture p001,p001,TRUE,p002,p003,someRandomOtherColumn with some regex that would capture things within f(,) and then filter for 'allowed' column names ( TRUE is not in that case). 我对此的看法：我的看法是，我应该能够使用一些正则表达式捕获p001,p001,TRUE,p002,p003,someRandomOtherColumn ，这些正则表达式将捕获f(,) ，然后过滤“允许的”列名（ TRUE不在那种情况下）。

Nested f(,,) are not an issue as I can call the same function recursively and nested f(,(),) are also fine. 嵌套的f(,,)并不是问题，因为我可以递归调用同一函数，嵌套的f(,(),)也很好。

What I have: From now this is what I have, this can be made to work but this feels bad 我所拥有的：从现在开始这就是我所拥有的，可以使它正常工作，但是感觉很糟糕

expr <- 'p001+mean(p001,na.rm=TRUE)-weighted.mean(p002,w=p003)+someRandomOtherColumn'
clean <- function(string) gsub(string, pattern='[_|\\.|a-zA-z]+\\(([^)]*)\\)', replacement='\\1', perl=TRUE)
clean(expr)
[1] "p001+p001,na.rm=TRUE-p002,w=p003+someRandomOtherColumn"
# Then I can remove =* than split on ,|+|-|*

Answer 1

When you add a ~ to your expression, you can create a valid R formula expression: 在表达式中添加~ ，可以创建一个有效的R公式表达式：

expr <- '~ p001+mean(p001,na.rm=TRUE)-weighted.mean(p002,w=p003)+someRandomOtherColumn'

This string can be converted to a formula with as.formula . 该字符串可以使用as.formula转换为公式。 Afterwards, the variable names can be extracted with all.vars : 之后，可以使用all.vars提取变量名称：

all.vars(as.formula(expr))
# [1] "p001"             "p002"             "p003"             "someRandomOtherColumn"

如何从R表达式提取变量以在data.frame上下文中求值

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-08-25 08:07:21

如何从R表达式提取变量以在data.frame上下文中求值

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-08-25 08:07:21

解决方案1
3 已采纳 2017-08-25 08:07:21