简体   繁体   English

如何在 R 中使用 LDA(线性判别分析)model 进行预测

[英]How to make predictions using an LDA (Linear discriminant analysis) model in R

as the title suggests I am trying to make predictions using an LDA model in R.正如标题所示,我正在尝试使用 R 中的 LDA model 进行预测。 I have two sets of data that I'm working with: the first set is a series of entries associated with 16 predictor variables and 1 outcome variable (the outcome variable are "groups" that each entry belongs to that I've assigned myself), the second set of data also consists of entries associated with the same 16 predictor variables, but with no outcome variable.我有两组正在使用的数据:第一组是与 16 个预测变量和 1 个结果变量相关联的一系列条目(结果变量是每个条目属于我自己分配的“组”) ,第二组数据也包含与相同的 16 个预测变量相关联的条目,但没有结果变量。 What I would like to do is predict the group membership of the entries in the second set of data.我想做的是预测第二组数据中条目的组成员身份。

So far I've successfully managed to create an LDA model by separating the first dataset into a "training set" and a "test set".到目前为止,我已经成功地通过将第一个数据集分为“训练集”和“测试集”来创建 LDA model。 However, now that I have the model I don't know how I would go about predicting the group membership of the entries in my second data set.但是,现在我有了 model,我不知道如何预测我的第二个数据集中条目的组成员身份。

Thanks for the help, Please let me know if any more information is required.感谢您的帮助,如果需要更多信息,请告诉我。 this is my first post on stack overflow so I am still learning the ropes.这是我关于堆栈溢出的第一篇文章,所以我还在学习。

Short example based on An introduction to Statistical learning, chapter 4 .基于统计学习简介的简短示例,第 4 章 Say you have fitted a model lda_model on a training_data set, with dependent variable Group which you aim to predict, and predictors Predictor1 and Predictor2假设您在training_data集上安装了 model lda_model ,其中包含您要预测的因变量Group ,以及预测变量Predictor1Predictor2

library(MASS)
lda_model <- lda (Group∼ Predictor1 + Predictor2, data = training_set)

You can then make predictions with the lda_model using the predict function on the testing_set然后,您可以使用 testing_set 上的predict testing_setlda_model进行预测

lda_predictions <- predict (lda_model, testing_set)

lda_predictions then holds the posterior probabilities in $posterior that the observation is part of Group j.然后, lda_predictions$posterior中保存观测值是Group j 的一部分的后验概率。

You could then apply a threshold of for instance (but not limiting to) 50% probability.然后,您可以应用例如(但不限于)50% 概率的阈值。 Eg例如

sum(lda_model$posterior[, 7] >= .5)

returns the number of observations for which the probabilty that the observation is part of Group 7 is larger than 50%返回观测值属于Group 7 组的概率大于 50% 的观测值数量

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM