简体   繁体   English

K 最近邻伪代码?

[英]K nearest neighbor pseudocode?

so I am trying to code up the k nearest neighbor algorithm.所以我正在尝试编写 k 最近邻算法。 The input to my function would be a set of data and a sample to classify.我的函数的输入将是一组数据和一个要分类的样本。 I am just trying to understand the workings of the algorithm.我只是想了解算法的工作原理。 Can you guys tell me if this "pseudocode" of what I am trying to do is correct?你们能告诉我我想要做的这个“伪代码”是否正确?

kNN (dataset, sample){

   1. Go through each item in my dataset, and calculate the "distance" from that data item to my specific sample.
   2. Out of those samples I pick the "k" ones that are most close to my sample, maybe in a premade array of "k" items?

}

The part I get confused with is when I say "go through each item in my dataset".我感到困惑的部分是当我说“遍历数据集中的每个项目”时。 Should I be going through each CLASS in my dataset and finding the k-nearest neighbors?我应该遍历数据集中的每个 CLASS 并找到 k 最近邻吗? Then from there finding which one is closest to my sample, which then tells me the class?然后从那里找到哪个最接近我的样本,然后告诉我班级?

Part 2 question(ish), is using this algorithm but without a sample.第 2 部分问题(ish),正在使用此算法但没有样本。 How would I calculate the "accuracy" of the data set?我将如何计算数据集的“准确性”?

I really am looking for broad word answers rather than specifics, but anything that helps me understand is appreciated.我真的在寻找宽泛的答案而不是细节,但任何有助于我理解的东西都值得赞赏。 I am implementing this in R.我正在 R 中实现它。

Thanks谢谢

Your pseudocode should change this way:你的伪代码应该这样改变:

kNN (dataset, sample){
   1. Go through each item in my dataset, and calculate the "distance" 
   from that data item to my specific sample.
   2. Classify the sample as the majority class between K samples in 
   the dataset having minimum distance to the sample.
}

This pseduocode has been illustrated int the following figure.此伪代码已在下图中说明。

在此处输入图片说明

Suppose the data set consists of two classes A and B, shown as red and blue respectively, and we want to apply KNN with K=5 for to samples, shown with green and purple stars.假设数据集由两个类 A 和 B 组成,分别显示为红色和蓝色,我们希望将 K=5 的 KNN 应用于样本,显示为绿色和紫色的星星。
KNN computes the distance of each test sample to all the samples and finds five neighbors, having minimum distances to the test sample, and assign the majority class to the test sample. KNN 计算每个测试样本到所有样本的距离并找到五个与测试样本距离最小的邻居,并将多数类分配给测试样本。

Accuracy : 1 - (Number of misclassified test samples / Number of test samples)准确度:1 - (错误分类的测试样本数/测试样本数)

For implementation in "R" you may see either this or this .对于在“R”中的实现,您可能会看到thisthis

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM