简体   繁体   English

R - 训练和验证样本中的线性判别分析

[英]Linear Discriminant Analysis in R - Training and validation samples

I am working with lda command to analyze a 2-column, 234 row dataset (x): column X1 contains the predictor variable (metric) and column X2 the independent variable (categorical, 4 categories). 我正在使用lda命令来分析2列234行数据集(x):列X1包含预测变量(度量),列X2包含自变量(分类,4类)。 I would like to build a linear discriminant model by using 150 observations and then use the other 84 observations for validation. 我想通过使用150个观测值建立线性判别模型,然后使用其他84个观测值进行验证。 After a random partitioning of data i get x.build and x.validation with 150 and 84 observations, respectively. 在对数据进行随机分区之后,我分别得到x.build和x.validation,分别为150和84个观测值。 I run the following 我运行以下内容

fit = lda(x.build$X2~x.build$X1, data=x.build, na.action="na.omit")

Then I run predict command like this: 然后我运行这样的预测命令:

pred = predict(fit, newdata=x.validation)

From the reading of the commands description I thought that in pred$class I would get the classification of validation data according to the model built, but I get the classification of 150 observations instead of the 84 I intended to use as validation data. 从阅读命令描述我认为在pred$class我会根据建立的模型得到验证数据的分类,但我得到了150个观察的分类,而不是我打算用作验证数据的84个。 I don't really know what is happening, can someone please give me an example of how I should be conducting this analysis? 我真的不知道发生了什么,有人可以举个例子来说明我应该如何进行这种分析吗?

Thank you very much in advance. 非常感谢你提前。

Try this instead: 试试这个:

fit = lda(X2~X1, data=x.build, na.action="na.omit")
pred = predict(fit, newdata=x.validation)

If you use this formula x.build$X2~x.build$X1 when you build the model, predict expects x.build$X1 column in the validation data. 如果在构建模型时使用此公式x.build$X2~x.build$X1 ,则预测验证数据中的x.build$X1列。 Obviously there isn't one so you get prediction for training data. 显然没有一个,所以你可以预测训练数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM