简体   繁体   English

如何进行CV测试以检查R中LDA的分类错误

[英]How to do the CV test to examine the classification error of LDA in R

Please give me a simple example. 请给我一个简单的例子。 I am in worry! 我很担心! I have tried the errorest function and do it as the example as it give for 10-fold cv of LDA. 我尝试了最错误的函数,并以它为例,因为它给出了10倍LDA的简历。 But when I used my own data, it just said the predict is not numeric. 但是当我使用自己的数据时,它只是表示预测不是数字。 I don't know why! 我不知道为什么! Thank you! 谢谢! The R code is like this. R代码是这样的。 I want to do the binary LDA so I generate the data: 我想执行二进制LDA,所以我生成数据:

library(MASS)
n=500
#generate x1 and x2. 
Sigma=matrix(c(2,0,0,1),nrow=2,ncol=2)
#Logistic model with parameter{1,4,-2}
beta.star=c(1,4,-2)
Xtilde=mvrnorm(n=n,mu=c(0.5,2),Sigma=Sigma)
X=cbind(1,Xtilde)
z=X%*%beta.star
#pass througn an inv-logit function
pr=exp(z)/(1+exp(z))
#Simulate binary response
# The "probability of respoonse is a vector"
y=rbinom(n,1,pr)

Then I use the LDA to get the model: 然后,我使用LDA来获取模型:

library(MASS)
df.cv=data.frame(V1=Xtilde[,1],V2=Xtilde[,2])
exper1<-lda(y~V1+V2,data=df.d)
plda<-predict(exper1,newdata=df.cv)

Finally I want to use the CV with th original data and see the error. 最后,我想对原始数据使用CV并查看错误。 I do this which is wrong: 我这样做是错误的:

mypredict.lda <- function(object, newdata)
  predict(object, newdata = newdata)$class
errorest(y ~ ., data=data.frame(da), model=lda,estimator ="cv", predict= as.numeric(mypredict.lda))

What should I do to get the error with CV? 我该怎么办才能得到CV错误?

So we start with all your previous code setting up fake data 因此,我们从您之前的所有代码开始设置虚假数据

library(MASS)
n=500
#generate x1 and x2. 
Sigma=matrix(c(2,0,0,1),nrow=2,ncol=2)

#Logistic model with parameter{1,4,-2}
beta.star=c(1,4,-2)
Xtilde=mvrnorm(n=n,mu=c(0.5,2),Sigma=Sigma)
X=cbind(1,Xtilde)
z=X%*%beta.star

#pass througn an inv-logit function
pr=exp(z)/(1+exp(z))
#Simulate binary response
y=rbinom(n,1,pr)

#Now we do the LDA
df.cv=data.frame(V1=Xtilde[,1],V2=Xtilde[,2])

Below, we divide the data into two parts; 下面,我们将数据分为两部分: a training set and a test set. 训练集和测试集。 If you want to do a 10 fold cross validation, you would use 0.9 instead of 0.8 (0.8 corresponds to 80% train, 20% test, which is five-fold cross validation) 如果要进行10倍交叉验证,则应使用0.9而不是0.8(0.8对应于80%的训练,20%的测试,这是5倍交叉验证)

library(ROCR)
inds=sample(1:nrow(df.cv),0.8*nrow(df.cv))
df.train=df.cv[inds,]
df.test=df.cv[-inds,]
train.model = lda(y[inds] ~ V1+V2, data=df.train)

From the trained model, we predict on the test set. 从训练好的模型中,我们可以预测测试集。 Below, I determine the predicted values, and then assess the accuracy of the predictions. 在下面,我确定预测值,然后评估预测的准确性。 Here, I use a ROC curve, but you can use whatever metric you'd like, I guess. 在这里,我使用ROC曲线,但是我想您可以使用任何指标。 I didn't understand what you meant by error. 我不明白您的意思是错误。

preds=as.numeric(predict(train.model, df.test)$class)
actual=y[-inds]
aucCurve=performance(prediction(preds,actual), "tpr", "fpr")
plot(aucCurve)

The area under this ROC curve is a measure of predictive accuracy. ROC曲线下的面积是预测准确性的量度。 Values closer to 1 mean you have good predictive capability. 值接近1表示您具有良好的预测能力。

auc=performance(prediction(preds,actual), "auc")
auc@y.values

Hopefully this helped, and isn't horribly incorrect. 希望这会有所帮助,并且并非十分错误。 Other folks please chime in with corrections or clarifications. 其他人则请进行更正或澄清。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM