K 最近邻伪代码？

Question

所以我正在尝试编写 k 最近邻算法。 我的函数的输入将是一组数据和一个要分类的样本。 我只是想了解算法的工作原理。 你们能告诉我我想要做的这个“伪代码”是否正确？

kNN (dataset, sample){

   1. Go through each item in my dataset, and calculate the "distance" from that data item to my specific sample.
   2. Out of those samples I pick the "k" ones that are most close to my sample, maybe in a premade array of "k" items?

}

我感到困惑的部分是当我说“遍历数据集中的每个项目”时。 我应该遍历数据集中的每个 CLASS 并找到 k 最近邻吗？ 然后从那里找到哪个最接近我的样本，然后告诉我班级？

第 2 部分问题（ish），正在使用此算法但没有样本。 我将如何计算数据集的“准确性”？

我真的在寻找宽泛的答案而不是细节，但任何有助于我理解的东西都值得赞赏。 我正在 R 中实现它。

谢谢

Answer 1

你的伪代码应该这样改变：

kNN (dataset, sample){
   1. Go through each item in my dataset, and calculate the "distance" 
   from that data item to my specific sample.
   2. Classify the sample as the majority class between K samples in 
   the dataset having minimum distance to the sample.
}

此伪代码已在下图中说明。

在此处输入图片说明

假设数据集由两个类 A 和 B 组成，分别显示为红色和蓝色，我们希望将 K=5 的 KNN 应用于样本，显示为绿色和紫色的星星。
KNN 计算每个测试样本到所有样本的距离并找到五个与测试样本距离最小的邻居，并将多数类分配给测试样本。

准确度：1 - （错误分类的测试样本数/测试样本数）

对于在“R”中的实现，您可能会看到this或this 。

K 最近邻伪代码？

问题描述

1 个解决方案

解决方案1
11 已采纳 2014-04-02 05:34:54

K 最近邻伪代码？

问题描述

1 个解决方案

解决方案1 11 已采纳 2014-04-02 05:34:54

解决方案1
11 已采纳 2014-04-02 05:34:54