简体   繁体   English

如何在R的lm中将“权重”列名作为变量传递?

[英]How to pass "weights" column name as a variable in R's lm?

The code below creates a linear model with R's lm, then a weighted model with a weights column.下面的代码创建一个带有 R 的 lm 的线性模型,然后是一个带有权重列的加权模型。 Finally, I try to pass in the weight column name with a variable weight_col and that fails.最后,我尝试使用变量weight_col传入权重列名称,但失败了。 I'm pretty sure it's looking for "weight_col" in df, then the caller's environment, finds a variable of length 1, and the lengths don't match.我很确定它在 df 中寻找“weight_col”,然后调用者的环境找到一个长度为 1 的变量,并且长度不匹配。

How do I get it to use weight_col as a name for the weights column in df?我如何让它使用 weight_col 作为 df 中权重列的名称?

I've tried several combinations of things without success.我已经尝试了几种组合,但都没有成功。

> df <- data.frame(
   x=c(1,2,3),
   y=c(4,5,7),
   w=c(1,3,5)
 )
> lm(y ~ x, data=df)

Call:
lm(formula = y ~ x, data = df)

Coefficients:
(Intercept)            x  
      2.333        1.500  

> lm(y ~ x, data=df, weights=w)

Call:
lm(formula = y ~ x, data = df, weights = w)

Coefficients:
(Intercept)            x  
      1.947        1.658  

> weight_col <- 'w'
> lm(y ~ x, data=df, weights=weight_col)
Error in model.frame.default(formula = y ~ x, data = df, weights = weight_col,  : 
  variable lengths differ (found for '(weights)')

> R.version.string
[1] "R version 3.6.3 (2020-02-29)"

You can use the data frame name with extractor operator:您可以将数据框名称与提取器运算符一起使用:

lm(y ~ x, data = df, weights = df[[weight_col]])

Or you can use function get :或者您可以使用函数get

lm(y ~ x, data = df, weights = get(weight_col))

We can use [[ to extract the value of the column我们可以使用[[来提取列的值

lm(y ~ x, data=df, weights=df[[weight_col]])

Or with tidyverse或者用tidyverse

library(dplyr)
df %>% 
   summarise(model  = list(y ~ x, weights = .data[[weight_col]]))

Your first example if weights = w , which is using non-standard evaluation to find w in the context of df .您的第一个示例 if weights = w ,它使用非标准评估在df的上下文中查找w So far, this is normal for interactive use.到目前为止,这对于交互式使用来说是正常的。

Your second set is weights = weight_col which resolves to weights = "w" , which is very different.你的第二组是weights = weight_col解析为weights = "w" ,这是非常不同的。 There is nothing in R's non-standard (or standard) evaluation in which that makes sense. R 的非标准(或标准)评估中没有任何内容是有意义的。

As I said in my comment, use the standard-evaluation form with [[ .正如我在评论中所说,使用带有[[的标准评估形式。

lm(y ~ x, data=df, weights=df[[weight_col]])
# Call:
# lm(formula = y ~ x, data = df, weights = df[[weight_col]])
# Coefficients:
# (Intercept)            x  
#       1.947        1.658  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM