[英]How to optimize parameter k in knn using a genetic algorithm
I try to optimize parameter k in knn using genetic algorithm in r.我尝试使用 r 中的遗传算法优化 knn 中的参数 k。 I tried it using the following code but still receive an error.
我使用以下代码尝试过,但仍然收到错误消息。 I used accuracy of the knn based on the selected k value as fitness function.
我使用基于所选 k 值的 knn 精度作为适应度函数。 Please help me if you know about knn and genetic algorithm.
如果您了解 knn 和遗传算法,请帮助我。 Here is what i've done.
这是我所做的。
library(caret)
library(GA)
library(class)
#data import
tea_jenis_F3 <- read.csv("D:/inggrit/program/F3.csv")
str(tea_jenis_F3)
#to check missing data
anyNA(tea_jenis_F3)
#data slicing
set.seed(101)
intrain_jenis_F3 <- createDataPartition(tea_jenis_F3$category, p= 0.7, list = FALSE)
training_jenis_F3 <- tea_jenis_F3 [intrain_jenis_F3,]
testing_jenis_F3 <- tea_jenis_F3 [-intrain_jenis_F3,]
#transforming the dependent variable to a factor
training_jenis_F3[["category"]] = factor(training_jenis_F3[["category"]])
#fitness function
fitness_KNN <- function(chromosome)
{
# First values in chromosome are 'k' of 'knn' method
tuneGrid <- data.frame(k=chromosome[1])
# train control
train_control <- trainControl(method = "cv",number = 10)
# train the model
set.seed(1234)
model <- train(category ~ ., data= training_jenis_F3, trControl=train_control,
method="knn", tuneGrid=tuneGrid)
# Extract accuracy statistics
accuracy_val <- model$results$accuracy
}
GA <- ga(type = "real-valued", fitness = fitness_KNN, lower = -10, upper = 10, monitor = NULL)
error :错误 :
Something is wrong; all the Accuracy metric values are missing:
Accuracy Kappa
Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA
Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA
NA's :1 NA's :1
Error: Stopping
In addition: There were 11 warnings (use warnings() to see them)
I would be grateful if you can help me.如果你能帮助我,我将不胜感激。 Thank you
谢谢
I think the problem does not lie in your code, but in the method: Using a genetic algorithm to optimize k
in this setting is not possible and also not necessary.我认为问题不在于您的代码,而在于方法:在此设置中使用遗传算法优化
k
是不可能的,也没有必要。
You called ga(type = "real-valued", lower = -10, upper = 10, ...)
which means ga
will search for the best value between -10 and 10. There are now two problems:你调用了
ga(type = "real-valued", lower = -10, upper = 10, ...)
这意味着ga
将搜索 -10 和 10 之间的最佳值。现在有两个问题:
k
are not possible for knnk
是不可能的KNNga
will produce non-integer values as eg 1.234 for k, which are of course also not possible ga
将产生非整数值,例如 k 的 1.234,这当然也是不可能的Fortunately, it is not necessary to use such a complicated method as genetic algorithms in this case.幸运的是,在这种情况下,没有必要使用遗传算法这样复杂的方法。 If you want to find the best
k
in the range [1, 10] just compute the model for each value like this:如果您想在 [1, 10] 范围内找到最好的
k
,只需像这样计算每个值的模型:
k_cands <- 1:10
accuracy <- numeric()
for(k in k_cands) {
[compute model with k]
accuracy <- c(accuracy, model$results$accuracy)
}
best_k <- k_cands[which.max(accuracy)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.