简体   繁体   中英

Plot boundary lines between classes in python based on multidimensional data?

I am trying to plot boundary lines of Iris data set using LDA in sklearn Python based on this documentation .

For two dimensional data , we can easily plot the lines using LDA.coef_ and LDA.intercept_ .

But for multidimensional data that has been reduced to two components, the LDA.coef_ and LDA.intercept has many dimensions which I don't know how to use these to plot the boundary lines in 2D reduced-dimension plot.

I've tried to plot using only the first two-element of LDA.coef_ and LDA.intercept , but It didn't work.

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

iris = datasets.load_iris()

X = iris.data
y = iris.target 
target_names = iris.target_names  

lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit(X, y).transform(X)

x = np.array([-10,10])
y_hyperplane = -1*(lda.intercept_[0]+x*lda.coef_[0][0])/lda.coef_[0][1]

plt.figure()
colors = ['navy', 'turquoise', 'darkorange']
lw = 2

plt.plot(x,y_hyperplane,'k')

for color, i, target_name in zip(colors, [0, 1, 2], target_names):
    plt.scatter(X_r2[y == i, 0], X_r2[y == i, 1], alpha=.8, color=color, 
lw=lw,
                label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title('LDA of IRIS dataset')

plt.show()

Result of boundary line produced by lda.coef_[0] and lda.intercept[0] showed a line that isn't likely to separate between two classes

enter image description here

I've tried using np.meshgrid to draw areas of the classes. But I get an error like this

ValueError: X has 2 features per sample; expecting 4

which expecting 4 dimensional of original data, instead of 2D points from the meshgrid.

Linear discriminant analysis ( LDA ) can be used as a classifier or for dimensionality reduction.

LDA for dimensionality reduction

Dimensionality reduction techniques reduces the number of features. Iris dataset has 4 features, lets use LDA to reduce it to 2 features so that we can visualise it.

from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

lda = LinearDiscriminantAnalysis(n_components=2)
lda_object = lda.fit(X, y)
X = lda_object.transform(X)

for l,c,m in zip(np.unique(y),['r','g','b'],['s','x','o']):
    plt.scatter(X[y==l,0],
                X[y==l,1],
                c=c, marker=m, label=l,edgecolors='black')

Output: 在此处输入图片说明

LDA for multi class classification

LDA does multi class classification using One-vs-rest. If you have 3 classes you will get 3 hyperplanes (decision boundaries) for each class. If there are n features then each hyperplane is represented using n weights (coefficients) and 1 intersect. In general

coef_ : shape of (n_classes, n_features)
intercept_ :  shape of (n_classes,)

Sample, documented inline

import matplotlib.pyplot as plt
import numpy as np
np.random.seed(13)

# Generate 3 linearly separable dataset of 2 features
X = [[0,0]]*25+[[0,10]]*25+[[10,10]]*25
X = np.array(list(map(lambda x: list(map(lambda y: np.random.randn()+y, x)), X)))
y = np.array([0]*25+[1]*25+[2]*25)

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis()
lda_object = lda.fit(X, y)

# Plot the hyperplanes
for l,c,m in zip(np.unique(y),['r','g','b'],['s','x','o']):
    plt.scatter(X[y==l,0],
                X[y==l,1],
                c=c, marker=m, label=l,edgecolors='black')

x1 = np.array([np.min(X[:,0], axis=0), np.max(X[:,0], axis=0)])

for i, c in enumerate(['r','g','b']):
    b, w1, w2 = lda.intercept_[i], lda.coef_[i][0], lda.coef_[i][1]
    y1 = -(b+x1*w1)/w2    
    plt.plot(x1,y1,c=c)

在此处输入图片说明

As you can see each decision boundary separates one class from the rest (follow the color of the decision boundary)

You case

You have dataset which is of 4 features, so you cannot visualise the data as well as the decision boundary (human visualisation is limited only upto 3D). One approach is to use LDA and reduce the dimentions to 2D and then again using LDA to classify these 2D features.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM