简体   繁体   English

最近邻绘图背景 colors

[英]Nearest Neighbor plotting with background colors

I have a dataset of n samples and 6 attributes and two classes.我有一个包含 n 个样本和 6 个属性以及两个类的数据集。

I am currently using the KNeighborsClassifier from Scikit Learn in order to classify a dataset's two classes.我目前正在使用 Scikit Learn 的 KNeighborsClassifier 来对数据集的两个类进行分类。

I am looking to plot the values of the dataset (across an arbitrary two attributes/domains of the dataset) and would look the plot to show the split according to the KNeighborsClassifier that I have.我正在查看 plot 数据集的值(跨越数据集的任意两个属性/域),并查看 plot 以根据我拥有的 KNeighborsClassifier 显示拆分。 In other words, to plot the values and have the background sections of the plot match what the classification would be.换句话说,plot 的值和 plot 的背景部分与分类相匹配。 So for example, Class 1 would be blue and Class 2 would be red.例如,Class 1 为蓝色,Class 2 为红色。 The points in those areas would have the appropriate color and the background (containing points are not) would have the appropriate color as well.这些区域中的点将具有适当的颜色,背景(包含不包含的点)也将具有适当的颜色。

However, I can't seem to find help online on how I can achieve this.但是,我似乎无法在网上找到有关如何实现此目标的帮助。 Any help would be appreciated.任何帮助,将不胜感激。

Plot two sets of points. Plot 两套分。 For example, assuming that:例如,假设:

  • There are two features (columns) in your data.您的数据中有两个特征(列)。
  • The class labels are integers. class 标签是整数。
  • The validation or test data are called X_test and y_test , and the predictions are called y_pred .验证或测试数据称为X_testy_test ,预测称为y_pred

You can do something like this:你可以这样做:

plt.scatter(*X_test.T, c=y_test, size=80, cmap='bwr', alpha=0.5)
plt.scatter(*X_test.T, c=y_pred, size=30, cmap='bwr')

This plots the actual labels in a larger size (so you can see them sticking out from behind the predictions), and with half opacity (ie semi-transparent).这会以更大的尺寸绘制实际标签(因此您可以看到它们从预测后面突出),并且具有半不透明度(即半透明)。

If your labels are strings (like 'A' and 'B') you'll have to encode them to use this approach (or just use something like c=y_test=='A' and c=y_pred=='A' in the two plot commands).如果你的标签是字符串(比如'A'和'B'),你必须对它们进行编码才能使用这种方法(或者只使用像c=y_test=='A'c=y_pred=='A'的东西两个 plot 命令)。 You might also need to adjust sizes to work for you.您可能还需要调整尺寸以适合您。

If you're trying to make a plot like these from the sklearn docs:如果你想从 sklearn 文档中制作一个像这样的 plot:

在此处输入图像描述

...then you're in luck because that page includes al lthe code you need to make that plot. ...那么您很幸运,因为该页面包含制作 plot 所需的所有代码。

The keys part is this:关键部分是这样的:

    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Here, they are making a grid of coordinates representing a fake dataset covering the space you see in the plot. Essentially, you're making a new X that is a regular grid of data.在这里,他们正在制作一个坐标网格,代表一个假数据集,覆盖您在 plot 中看到的空间。本质上,您正在制作一个新的X ,它是一个规则的数据网格。

Then you classify those points using either the predict or predict_proba method on the classifier.然后使用分类器上的predictpredict_proba方法对这些点进行分类。 These need to be reshaped into the grid shape:这些需要重新塑造成网格形状:

Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
Z = Z.reshape(xx.shape)

Then you can look at Z with something like plt.imshow(Z) .然后你可以用类似plt.imshow(Z) Z东西来查看 Z。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM