简体   繁体   English

如何在给定训练和使用类标签测试数据的R中执行最小二乘回归?

[英]How to perform least squares regression in R given training and testing data with class labels?

I have a 63*62 training set and the class labels are also present. 我有63 * 62的训练集,并且还有班级标签。 The test data is a 25*62 dimensions and has the class labels too. 测试数据是25 * 62维度,并且也有类别标签。 Given this how would I perform least squares regression? 鉴于此,我将如何进行最小二乘回归? I am using the code: 我正在使用代码:

res = lm(height~age)

what does height and age correspond to? 身高和年龄对应的是什么? When I have 61 features + 1 class (making it 62 columns for the training data) how would I input parameters? 当我有61个特征+ 1个类(使其为训练数据的62列)时,我将如何输入参数?

Also how do I apply the model on the testing data? 另外,我如何在测试数据上应用模型?

If you have 62 columns you may want to use the more general formula 如果您有62列,则可能需要使用更通用的公式

res = lm(height ~ . , data = mydata)

Notice how the period '.' 注意周期'。' represent the rest of the variables. 代表其余的变量。 But the previous answer is completely right in the sense that there are more variables than observations and therefore the answer (if there's any which shouldn't be) is completely useless. 但是前面的答案是完全正确的,因为变量多于观察,因此答案(如果有任何不应该的话)是完全无用的。

height and age would be simply the labels of columns in your data frame. heightage只是数据框中列的标签。 height is a predicted variable. height是预测变量。 You can have as many variables there as you wish: res = lm(height~age+wight+gender) 你可以在那里拥有任意数量的变量: res = lm(height~age+wight+gender)

However, I must say that the question seems a bit strange to me because if you are performing a regression with 62 variables having 62 points in training set it will simply mean that you will always have an exact solution. 但是,我必须说这个问题对我来说似乎有点奇怪,因为如果你在训练集中使用62个变量进行回归,那么这只会意味着你总会有一个精确的解决方案。 Training set should always be (significantly) larger than the number of variables used. 训练集应始终(显着)大于使用的变量数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM