My problem is that I have 3 features, but I only want to plot a 2D graphic while using 2 features at a time and show all the possible combinations.
The problem is that I did classifier.fit(X_train, Y_train)
so it expects to be trained with 3 features, not just 2. X_train
is the size (70, 3) which is (n_samples, n_features).
So far I tweaked the original code to add z_min
and z_max
, since I do need to have this third feature that I need to be able to use classifier.predict()
.
The error I get at the plt.contourf
instruction is Input z must be a 2D array.
import matplotlib as pl
import matplotlib.colors as colors
import matplotlib.cm as cmx
x_min, x_max = X_train[:, 0].min() - 1, X_train[:, 0].max() + 1
y_min, y_max = X_train[:, 1].min() - 1, X_train[:, 1].max() + 1
z_min, z_max = X_train[:, 2].min() - 1, X_train[:, 2].max() + 1
xx, yy, zz = np.meshgrid(np.arange(x_min, x_max, 0.1),
np.arange(y_min, y_max, 0.1),
np.arange(z_min, z_max, 0.1))
fig, ax = plt.subplots()
# here "model" is your model's prediction (classification) function
Z = classifier.predict(np.c_[np.c_[xx.ravel(), yy.ravel()], zz.ravel()])
# Put the result into a color plot
Z = Z.reshape(len(Z.shape), 2)
plt.contourf(xx, yy, Z, cmap=pl.cm.Paired)
plt.axis('off')
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=pl.cm.Paired)
print(z.shape)
= (4612640,)
print(xx.shape)
= (20, 454, 508)
How can I plot a 2D array + train with 3 features but only plot 2 features and keep the right shape for my array Z
? How can I get Z
to the right size?
What I tried so far:
I want something like this, bus instead I have 2 features and I can only predict 2 values, not 3 like the example.
But again all the examples I'm seeing, they are only training with 2 features so they are good to go from my understanding, they are not facing my problem with the Z
shape that's not the right one.
Would it also be possible to visualize this with a 3D graphic so we can see the 3 features ?
I don't think the shape/size is the main issue here. You have to do some calculation before you can plot a 2D decision surface ( contourf
) for a 3D feature space. A correct contour plot requires that you have a single defined value ( Z
) for each pair of (X, Y)
. Take your example and look just xx
and yy
:
import pandas as pd
df = pd.DataFrame({'x': xx.ravel(),
'y': yy.ravel(),
'Class': Z.ravel()})
xy_summ = df.groupby(['x', 'y']).agg(lambda x: x.value_counts().to_dict())
xy_summ = (xy_summ.drop('Class', axis=1)
.reset_index()
.join(pd.DataFrame(list(xy_summ.Class)))
.fillna(0))
xy_summ[[0, 1, 2]] = xy_summ[[0, 1, 2]].astype(np.int)
xy_summ.head()
You would find out that for each pair of xx
and yy
you would get 2 or 3 possible classes, depending on what zz
is there:
xx yy 0 1 2
0 3.3 1.0 25 15 39
1 3.3 1.1 25 15 39
2 3.3 1.2 25 15 39
3 3.3 1.3 25 15 39
4 3.3 1.4 25 15 39
Therefore, to make a 2D contourf
work, you have to decide what Z you'd like to call from 2 or 3 possibilities. For example, you can have a weighted class call like:
xy_summ['weighed_class'] = (xy_summ[1] + 2 * xy_summ[2]) / xy_summ[[0, 1, 2]].sum(1)
This will then allow you to draw a successful 2D plot:
import itertools
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
iris = load_iris()
X = iris.data[:, 0:3]
Y = iris.target
clf = DecisionTreeClassifier().fit(X, Y)
plot_step = 0.1
a, b, c = np.hsplit(X, 3)
ar = np.arange(a.min()-1, a.max()+1, plot_step)
br = np.arange(b.min()-1, b.max()+1, plot_step)
cr = np.arange(c.min()-1, c.max()+1, plot_step)
aa, bb, cc = np.meshgrid(ar, br, cr)
Z = clf.predict(np.c_[aa.ravel(), bb.ravel(), cc.ravel()])
datasets = [[0, len(ar), aa],
[1, len(br), bb],
[2, len(cr), cc]]
for i, (xsets, ysets) in enumerate(itertools.combinations(datasets, 2)):
xi, xl, xx = xsets
yi, yl, yy = ysets
df = pd.DataFrame({'x': xx.ravel(),
'y': yy.ravel(),
'Class': Z.ravel()})
xy_summ = df.groupby(['x', 'y']).agg(lambda x: x.value_counts().to_dict())
xy_summ = (xy_summ.drop('Class', axis=1)
.reset_index()
.join(pd.DataFrame(list(xy_summ.Class)))
.fillna(0))
xy_summ['weighed_class'] = (xy_summ[1] + 2 * xy_summ[2]) / xy_summ[[0, 1, 2]].sum(1)
xyz = (xy_summ.x.values.reshape(xl, yl),
xy_summ.y.values.reshape(xl, yl),
xy_summ.weighed_class.values.reshape(xl, yl))
ax = plt.subplot(1, 3, i + 1)
ax.contourf(*xyz, cmap=mpl.cm.Paired)
ax.scatter(X[:, xi], X[:, yi], c=Y, cmap=mpl.cm.Paired, edgecolor='black')
ax.set_xlabel(iris.feature_names[xi])
ax.set_ylabel(iris.feature_names[yi])
plt.show()
If I understand this correctly, "visualize this with a 3D graph" will be difficult. You've got not only 3 features, which make it 3D, but also a class call. In the end, you actually have to work with a 4D data, or density like data in a 3D space. I guess this might be the reason why a 3D decision space (not really surface anymore) graph is not quite common.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.