简体   繁体   English

如何使用遗传算法优化 knn 中的参数 k

[英]How to optimize parameter k in knn using a genetic algorithm

I try to optimize parameter k in knn using genetic algorithm in r.我尝试使用 r 中的遗传算法优化 knn 中的参数 k。 I tried it using the following code but still receive an error.我使用以下代码尝试过,但仍然收到错误消息。 I used accuracy of the knn based on the selected k value as fitness function.我使用基于所选 k 值的 knn 精度作为适应度函数。 Please help me if you know about knn and genetic algorithm.如果您了解 knn 和遗传算法,请帮助我。 Here is what i've done.这是我所做的。

 library(caret)
 library(GA)
 library(class)

#data import 
tea_jenis_F3 <- read.csv("D:/inggrit/program/F3.csv")
str(tea_jenis_F3)

#to check missing data 
anyNA(tea_jenis_F3)

#data slicing
set.seed(101)
intrain_jenis_F3 <- createDataPartition(tea_jenis_F3$category, p= 0.7, list = FALSE)
training_jenis_F3 <- tea_jenis_F3 [intrain_jenis_F3,]
testing_jenis_F3 <- tea_jenis_F3 [-intrain_jenis_F3,]

#transforming the dependent variable to a factor 
training_jenis_F3[["category"]] = factor(training_jenis_F3[["category"]])

#fitness function
fitness_KNN <- function(chromosome)
{
  # First values in chromosome are 'k' of 'knn' method
  tuneGrid <- data.frame(k=chromosome[1])


  # train control
  train_control <- trainControl(method = "cv",number = 10)

  # train the model
  set.seed(1234)
  model <- train(category ~ ., data= training_jenis_F3, trControl=train_control, 
                 method="knn", tuneGrid=tuneGrid)

  # Extract accuracy statistics
  accuracy_val <- model$results$accuracy

}


GA <- ga(type = "real-valued", fitness = fitness_KNN, lower = -10, upper = 10, monitor = NULL)

error :错误 :

Something is wrong; all the Accuracy metric values are missing:
Accuracy       Kappa    
 Min.   : NA   Min.   : NA  
 1st Qu.: NA   1st Qu.: NA  
 Median : NA   Median : NA  
 Mean   :NaN   Mean   :NaN  
 3rd Qu.: NA   3rd Qu.: NA  
 Max.   : NA   Max.   : NA  
 NA's   :1     NA's   :1    
Error: Stopping
In addition: There were 11 warnings (use warnings() to see them)

I would be grateful if you can help me.如果你能帮助我,我将不胜感激。 Thank you谢谢

I think the problem does not lie in your code, but in the method: Using a genetic algorithm to optimize k in this setting is not possible and also not necessary.我认为问题不在于您的代码,而在于方法:在此设置中使用遗传算法优化k是不可能的,也没有必要。

You called ga(type = "real-valued", lower = -10, upper = 10, ...) which means ga will search for the best value between -10 and 10. There are now two problems:你调用了ga(type = "real-valued", lower = -10, upper = 10, ...)这意味着ga将搜索 -10 和 10 之间的最佳值。现在有两个问题:

  1. Negative values of k are not possible for knn的负值k是不可能的KNN
  2. ga will produce non-integer values as eg 1.234 for k, which are of course also not possible ga将产生非整数值,例如 k 的 1.234,这当然也是不可能的

Fortunately, it is not necessary to use such a complicated method as genetic algorithms in this case.幸运的是,在这种情况下,没有必要使用遗传算法这样复杂的方法。 If you want to find the best k in the range [1, 10] just compute the model for each value like this:如果您想在 [1, 10] 范围内找到最好的k ,只需像这样计算每个值的模型:

k_cands <- 1:10
accuracy <- numeric()

for(k in k_cands) {
  [compute model with k]
  accuracy <- c(accuracy, model$results$accuracy)
}

best_k <- k_cands[which.max(accuracy)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM