[英]Regression model with all possible two way interaction terms in r
I have a data set with 8 variables.我有一个包含 8 个变量的数据集。 I need all possible two way interaction terms along with the seven predictors in each model.我需要所有可能的双向交互项以及每个 model 中的七个预测变量。 So, in my case there will be total 7C2 = 21 models, each of them containing the 7 predictors and a two way interaction term at a time.因此,在我的例子中,总共将有 7C2 = 21 个模型,每个模型一次包含 7 个预测变量和一个双向交互项。
I have tried to produce the 21 models using for loop but the code seems to fail at the lm() function when I try to use that inside the for loop.我尝试使用 for 循环生成 21 个模型,但是当我尝试在 for 循环中使用该代码时,代码似乎在 lm() function 处失败。 In my problem return is the response variable at the 5-th column of my data.在我的问题中,返回的是我数据第 5 列的响应变量。
colnames(dt) = c("assets","turnover_ratio","SD","sharpe_ratio","return",
"expense_ratio","fund_dummy","risk_dummy")
vars=colnames(dt)[-5]
for (i in vars) {
for (j in vars) {
if (i != j) {
factor= paste(i,j,sep='*')}
lm.fit <- lm(paste("return ~", factor), data=dt)
print(summary(lm.fit))
}}
The error message is given below for the code:下面给出了代码的错误消息:
Error in paste("return ~", factor): cannot coerce type 'closure' to vector of type 'character'粘贴错误(“return ~”,因子):无法将“闭包”类型强制转换为“字符”类型的向量
This is my data set:这是我的数据集:
The output below should be the desired output and 20 more such models are needed with other possible two way interaction terms.下面的 output 应该是所需的 output 并且需要另外 20 个这样的模型以及其他可能的双向交互项。 All the 7 predictors should be present in each model.每个 model 中都应存在所有 7 个预测变量。 The only thing that should change is the two way interaction term.唯一应该改变的是双向交互项。
This is my desired output among the 21 required:这是所需的 21 个中我想要的 output:
Your problem is the end of the if statement.您的问题是 if 语句的结尾。 This code should work:此代码应该可以工作:
colnames(dt) = c("assets","turnover_ratio","SD","sharpe_ratio","return",
"expense_ratio","fund_dummy","risk_dummy")
vars=colnames(dt)[-5]
for (i in vars) {
for (j in vars) {
if (i != j) {
factor= paste(i,j,sep='*')
lm.fit <- lm(paste0("return ~", factor), data=dt)
print(summary(lm.fit))
}
}
}
The problem was that for the first iteration the variable factor was not define.问题是对于第一次迭代,变量因子没有定义。 Also try not to name a variable factor, since factor is a function in R.也尽量不要命名变量因子,因为因子是 R 中的 function。
The following apply
loop gets all pairwise interactions between the 7 variables.以下apply
循环获取 7 个变量之间的所有成对交互。 The 21 pairs are first obtained with combn
.首先使用combn
获得 21 对。
vars <- colnames(dt)[-5]
resp <- colnames(dt)[5]
cmb <- combn(vars, 2)
lm_list <- apply(cmb, 2, function(regrs){
inter_regrs <- paste(regrs, collapse = "*")
other_regrs <- setdiff(vars, regrs)
all_regrs <- paste(other_regrs, collapse = "+")
all_regrs <- paste(all_regrs, inter_regrs, sep = "+")
fmla <- as.formula(paste(resp, all_regrs, sep = "~"))
lm(fmla, data = dt)
})
lapply(lm_list, summary)
Data creation code.数据创建代码。
set.seed(1234)
dt <- replicate(8, rnorm(100))
dt <- as.data.frame(dt)
colnames(dt) <- c("assets","turnover_ratio","SD",
"sharpe_ratio","return","expense_ratio",
"fund_dummy","risk_dummy")
I think this should work and allow you to get rid of the loops:我认为这应该可以让你摆脱循环:
lm.fit = lm(return ~ (.)^2, data=dt)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.