简体   繁体   English

R,线性回归,lm,约

[英]R, linear regression, lm, approx

I would like use the linear regression to estimate the Concentration from the count using linear regression this is a sample of my dataset: 我想使用线性回归从使用线性回归的计数估算浓度,这是我的数据集的一个样本:

Concentration   count#0
Ctcf                    3153
Err                 2228
Nkx3-2              4
Isl/                    6
Engrailed               10
Dr                  14
Usf                 461
Dach1/Dac               4185
POS_C(8)    139664      1143
POS_A(128)  2234624     8897
POS_F(0.125)    2182            20
POS_D(2)    34916           220
POS_B(32)   558656      3359
POS_E(0.5)  8729            21

I am wondering if is better to use lm and then predict or to use approx . 我想知道使用lm然后预测还是使用approx更好。 and approxfun ? approxfun I am not an expert in Statistics and I didn't find any explanation on Internet. 我不是统计学专家,在Internet上也找不到任何解释。 Thanks! 谢谢!

lm is what you use if you want to fit an ordinary linear regression (LR). 如果要拟合普通线性回归(LR),则使用lm If you believe that your response can be well described by a linear combination of your predictors then LR might be appropriate. 如果您认为您的反应可以通过预测变量的线性组合很好地描述,那么LR可能是合适的。 You don't need the data to be normal for an LR to work but you do need (approximate) normality if you're going to compute test statistics for the parameters and such. 您不需要数据就可以正常运行LR,但是如果要计算参数等的测试统计信息,则需要(近似)正态性。 Also if you're interested in inference and coefficient interpretation don't forget to check the usual diagnostics (residuals have mean 0, common variance and no trends, outliers, multicollinearity, normality, etc). 同样,如果您对推理和系数解释感兴趣,请不要忘记检查常规诊断(残差均值为0,共同方差且无趋势,离群值,多重共线性,正态性等)。

The actual model for an LR is Y = X %*% beta + e where Y , beta , and e are vectors, X is a matrix, and %*% denotes matrix multiplication. LR的实际模型是Y = X %*% beta + e ,其中Ybetae是向量, X是矩阵, %*%表示矩阵乘法。 This notation assumes that the first column of X is all 1's. 该表示法假设X的第一列全为1。 By default lm uses a QR decomposition which allows it to avoid computing the inverse of t(X) %*% X and even t(X) %*% X , which is a big time saver if X is large. 缺省情况下, lm使用QR分解,从而避免计算t(X) %*% X甚至t(X) %*% X的倒数,如果X大,则可以节省大量时间。

lm finds [but not by direct computation] solve(t(X) %*% X) %*% t(X) %*% Y which gives us the unique (provided X is full rank) estimate of beta . lm发现[但不是直接计算] solve(t(X) %*% X) %*% t(X) %*% Y ,这为我们提供了beta的唯一估计(假设X为满秩)。

You definitely do not want to use anything else if a plain LR is all you want. 如果您只想使用普通LR,则您绝对不希望使用其他任何东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM