简体   繁体   English

在R中“向量化”此for循环? (在lm中抑制交互作用的主要作用)

[英]“Vectorizing” this for-loop in R? (suppressing interaction main effects in lm)

When interactions are specified in lm, R includes main effects by default, with no option to suppress them. 在lm中指定交互时,R默认情况下包括主要效果,而没有抑制它们的选项。 This is usually appropriate and convenient, but there are certain instances (within estimators, ratio LHS variables, among others) where this isn't appropriate. 这通常是适当且方便的,但是在某些情况下(在估算器内,比率LHS变量之内),这是不合适的。

I've got this code that fits a log-transformed variable to a response variable, independently within subsets of the data. 我有这段代码,它独立于数据子集内,使对数转换后的变量适合于响应变量。

Here is a silly yet reproducible example: 这是一个愚蠢但可复制的示例:

id = as.factor(c(1,2,2,3,3,3,4,4,4,4,5,5,5,5,6,7,7,8,8,8,9,9,9,9,10))
x = rexp(length(id))
y = rnorm(length(id))
logx = log(x)
data = data.frame(id,y,logx)

for (i in data$id){
    sub = subset(data, id==i)   #This splits the data by id
    m = lm(y~logx-1,data=sub)   #This gives me the linear (log) fit for one of my id's
    sub$x.tilde = log(1+3)*m$coef   #This linearizes it and gives me the expected value for x=3
    data$x.tilde[data$id==i] = sub$x.tilde #This puts it back into the main dataset
    data$tildecoeff[data$id==i] = m$coef #This saves the coefficient (I use it elsewhere for plotting)
    }

I want to fit a model like the following: 我想拟合以下模型:

Y = B(X*id) +e Y = B(X * id)+ e

with no intercept and no main effect of id . 没有拦截,也没有id主要作用。 As you can see from the loop, I'm interested in the expectation of Y when X=3, constrained the fit through the origin (because Y is a (logged) ratio of Y[X=something]/Y[X=0]. 从循环中可以看到,我对X = 3时对Y的期望感兴趣,限制了通过原点的拟合(因为Y是Y [X = something] / Y [X = 0 ]。

But if I specify 但是如果我指定

m = lm(Y~X*as.factor(id)-1)

there is no means of suppressing the main effects of id . 无法抑制id的主要影响。 I need to run this loop several hundred times in an iterative algorithm, and as a loop it is far too slow. 我需要在迭代算法中运行此循环数百次,而作为一个循环,它太慢了。

The other upside of de-looping this code is that it'll be much more convenient to get prediction intervals. 取消循环这段代码的另一个好处是,获得预测间隔会更加方便。

(Please, I don't need pious comments about how leaving out main effects and intercepts is improper -- it usually is, but I can promise that it isn't in this instance). (请,我不需要虔诚的评论,以免遗漏主要效果和拦截是不适当的-通常是这样,但是我可以保证在这种情况下不是这样)。

Thanks in advance for any ideas! 预先感谢您的任何想法!

I think you want 我想你要

m <- lm(y ~ 0 + logx : as.factor(id))

see R-intro '11.1 Defining statistical models; 参见R-intro '11 .1定义统计模型; formulae' 制定

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM