[英]How to fit a multitple linear regression model on 1664 explantory variables in R
I have one response variable, and I'm trying to find a way of fitting a multiple linear regression model using 1664 different explanatory variables. 我有一个响应变量,我试图找到一种使用1664个不同的解释变量拟合多元线性回归模型的方法。 I'm quite new to R and was taught the way of doing this by stating the formula using each of the explanatory variables in the formula.
我对R很陌生,并通过使用公式中的每个解释变量来说明公式,从而教会了我这样做的方法。 However as I have 1664 variables, it would take too long to do.
但是,由于我有1664个变量,因此需要花费太长时间。 Is there a quicker way of doing this?
有更快的方法吗?
Thank you! 谢谢!
I think you want to select from the 1664 variables a valid model, ie a model that predicts as much of the variability in the data with as few explanatory variables. 我认为您想从1664个变量中选择一个有效的模型,即,一个模型,它可以预测数据中的可变性,而解释性变量则少。 There are several ways of doing this:
有几种方法可以做到这一点:
stepAIC
for a way of doing this using the Aikaike Information Criterium. stepAIC
了解使用Aikaike信息标准进行此操作的方法。 Correlating 1664 variables with data will yield around 83 significant correlations if you choose a 95% significance level (0.05 * 1664) purely based on randomness. 如果纯粹基于随机性选择95%的显着性水平(0.05 * 1664),则将1664变量与数据相关将产生约83个显着相关。 So, tread carefully with the automatic variable selection.
因此,请谨慎选择自动变量选择。 Cutting down the amount of variables with expert knowledge or some decorrelation techniques (eg principal component analysis) would help.
用专业知识或一些去相关技术(例如主成分分析)减少变量的数量将有所帮助。
For a code example, you first need to include an example of your own (data + code) on which I can build. 对于一个代码示例,您首先需要包含一个自己的示例(数据+代码),我可以在上面构建该示例。
I'll answer the programming question, but note that often a regression with that many variables could use some sort of variable selection procedure (eg @PaulHiemstra's suggestions). 我将回答编程问题,但请注意,具有这么多变量的回归通常可以使用某种变量选择过程(例如@PaulHiemstra的建议)。
form <- y ~ .
form <- y ~ .
-y〜 form <- y ~ .
, where the dot indicates all variables not yet mentioned. form <- as.formula( paste( "y ~", paste(myVars,sep="+") ) )
form <- as.formula( paste( "y ~", paste(myVars,sep="+") ) )
Then run your regression: 然后运行回归:
lm( form, data=dat )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.