简体   繁体   English

Python 中 SVM 的可视化 (2D)

[英]Visualization (2D) of SVM in Python

I have an assignment, which is below.我有一个作业,在下面。 I have done the first 5 tasks and have a problem with the last one.我已经完成了前 5 个任务,但最后一个有问题。 To plot it.来绘制它。 Please give instruction on how to do it.请给出如何操作的说明。 Thank you in advance.先感谢您。

*(I have started learning SVM and ML just several days ago, please take it into account) *(前几天刚开始学习SVM和ML,请慎重考虑)

**(As I think the sequence of actions should be the same for plotting for all types of kernels. If you show even for one of them it would be great. I will try to adapt your code for others) **(因为我认为对于所有类型的内核进行绘图,操作的顺序应该是相同的。如果您甚至为其中一个显示它会很棒。我会尝试为其他人调整您的代码)

The procedure to follow:要遵循的程序:

  1. Randomly take the samples from this map.从这张地图中随机抽取样本。 (#100) and take this into Python for SVC. (#100) 并将其带入 Python for SVC。 This dataset includes Easting, Northing and Rock information.该数据集包括东距、北距和岩石信息。

  2. With these 100 randomly selected samples, split again randomly to train and test datasets.使用这 100 个随机选择的样本,再次随机拆分以训练和测试数据集。

  3. Try to run the SVC with the kernels of linear, polynomial, radial basis function, and tangent.尝试使用线性、多项式、径向基函数和切线的内核运行 SVC。

  4. Find the best of each, for instance, if you are using a radial basis function, which "C" and "gamma" can be the optimum one based on the accuracy that you get from accuracy scores.例如,如果您使用径向基函数,那么根据您从准确度分数中获得的准确度,“C”和“gamma”可能是最佳的。

  5. Once you have the fitted model and you calculated the accuracy scores (obtained from test dataset), then import the whole dataset into the obtained FIT MODELS and predict the output of all that 90,000 sample points that we have in the reference.csv.一旦你有了拟合模型并计算了准确度分数(从测试数据集获得),然后将整个数据集导入获得的 FIT MODELS 并预测我们在 reference.csv 中拥有的所有 90,000 个样本点的输出。

  6. Show me the obtained maps and also the accuracy scores that you get from each FIT MODEL.向我展示获得的地图以及您从每个 FIT 模型获得的准确度分数。

The dataset looks like:数据集如下所示:

enter image description here在此处输入图片说明

90000 points in the same style.同款90000分。

Here is the code:这是代码:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

### Importing Info

df = pd.read_csv("C:/Users/Admin/Desktop/RA/step 1/reference.csv", header=0)
df_model = df.sample(n = 100)
df_model.shape

## X-y split

X = df_model.loc[:,df_model.columns!="Rock"]
y = df_model["Rock"]
y_initial = df["Rock"]

### for whole dataset

X_wd = df.loc[:, df_model.columns!="Rock"]
y_wd = df["Rock"]

## Test-train split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

## Standardizing the Data

from sklearn.preprocessing import StandardScaler

sc = StandardScaler().fit(X_train)
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)

## Linear
### Grid Search

from sklearn.model_selection import GridSearchCV
from sklearn import svm
from sklearn.metrics import accuracy_score, confusion_matrix

params_linear = {'C' : (0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500,1000)}
clf_svm_l = svm.SVC(kernel = 'linear')
svm_grid_linear = GridSearchCV(clf_svm_l, params_linear, n_jobs=-1,
                              cv = 3, verbose = 1, scoring = 'accuracy')

svm_grid_linear.fit(X_train_std, y_train)
svm_grid_linear.best_params_
linsvm_clf = svm_grid_linear.best_estimator_
accuracy_score(y_test, linsvm_clf.predict(X_test_std))

### training svm

clf_svm_l = svm.SVC(kernel = 'linear', C = 0.1)
clf_svm_l.fit(X_train_std, y_train)

### predicting model

y_train_pred_linear = clf_svm_l.predict(X_train_std)
y_test_pred_linear = clf_svm_l.predict(X_test_std)
y_test_pred_linear
clf_svm_l.n_support_

### whole dataset

y_pred_linear_wd = clf_svm_l.predict(X_wd)

### map
        


## Poly
### grid search for poly

params_poly = {'C' : (0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500,1000),
         'degree' : (1,2,3,4,5,6)}
clf_svm_poly = svm.SVC(kernel = 'poly')
svm_grid_poly = GridSearchCV(clf_svm_poly, params_poly, n_jobs = -1,
                            cv = 3, verbose = 1, scoring = 'accuracy')
svm_grid_poly.fit(X_train_std, y_train)
svm_grid_poly.best_params_
polysvm_clf = svm_grid_poly.best_estimator_
accuracy_score(y_test, polysvm_clf.predict(X_test_std))

### training svm

clf_svm_poly = svm.SVC(kernel = 'poly', C = 50, degree = 2)
clf_svm_poly.fit(X_train_std, y_train)

### predicting model

y_train_pred_poly = clf_svm_poly.predict(X_train_std)
y_test_pred_poly = clf_svm_poly.predict(X_test_std)

clf_svm_poly.n_support_

### whole dataset

y_pred_poly_wd = clf_svm_poly.predict(X_wd)

### map            


## RBF

### grid search rbf

params_rbf = {'C' : (0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50, 100, 500,1000),
         'gamma' : (0.001, 0.01, 0.1, 0.5, 1)}
clf_svm_r = svm.SVC(kernel = 'rbf')
svm_grid_r = GridSearchCV(clf_svm_r, params_rbf, n_jobs = -1,
                         cv = 10, verbose = 1, scoring = 'accuracy')
svm_grid_r.fit(X_train_std, y_train)
svm_grid_r.best_params_
rsvm_clf = svm_grid_r.best_estimator_
accuracy_score(y_test, rsvm_clf.predict(X_test_std))

### training svm

clf_svm_r = svm.SVC(kernel = 'rbf', C = 500, gamma = 0.5)
clf_svm_r.fit(X_train_std, y_train)

### predicting model

y_train_pred_r = clf_svm_r.predict(X_train_std)
y_test_pred_r = clf_svm_r.predict(X_test_std)

### whole dataset

y_pred_r_wd = clf_svm_r.predict(X_wd)

### map            


## Tangent

### grid search

params_tangent = {'C' : (0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1, 5, 10, 50),
         'gamma' : (0.001, 0.01, 0.1, 0.5, 1)}
clf_svm_tangent = svm.SVC(kernel = 'sigmoid')
svm_grid_tangent = GridSearchCV(clf_svm_tangent, params_tangent, n_jobs = -1,
                            cv = 10, verbose = 1, scoring = 'accuracy')
svm_grid_tangent.fit(X_train_std, y_train)
svm_grid_tangent.best_params_
tangentsvm_clf = svm_grid_tangent.best_estimator_
accuracy_score(y_test, tangentsvm_clf.predict(X_test_std))

### training svm

clf_svm_tangent = svm.SVC(kernel = 'sigmoid', C = 1, gamma = 0.1)
clf_svm_tangent.fit(X_train_std, y_train)

### predicting model

y_train_pred_tangent = clf_svm_tangent.predict(X_train_std)
y_test_pred_tangent = clf_svm_tangent.predict(X_test_std)

### whole dataset

y_pred_tangent_wd = clf_svm_tangent.predict(X_wd)

### map

From your sample data, it looks like you are dealing with regularly spaced data, and the rows / cols are iterated in a monotonously increasing fashion.从您的示例数据来看,您似乎正在处理定期间隔的数据,并且行/列以单调递增的方式进行迭代。 Here is one way to reshape this dataset into 2d array (by reshaping the array into rows) and plot it accordingly:这是将此数据集重塑为二维数组的一种方法(通过将数组重塑为行)并相应地绘制它:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# create sample data
data = {
    'Easting': [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3],
    'Northing': [0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2],
    'Rocks': [0, 0, 1, 0, 0, 2, 0, 0, 0, 1, 0, 0],
}
df = pd.DataFrame(data)

# reshape data into 2d matrix (assuming easting / northing steps from 0 to max value)
max_easting = np.max(df['Easting'])
img_data = np.reshape(data['Rocks'], (max_easting, -1))

# plot as image
plt.imshow(img_data)
plt.show()

If you are dealing with irregular spaced data, ie not every easting/northing combination has a value, you might look into plotting irregular spaced data .如果您正在处理不规则间隔数据,即并非每个东/北组合都有值,您可能会考虑绘制不规则间隔数据

Here is the answer for plotting linear visualization, for those who will encounter the same problem as me.这是绘制线性可视化的答案,对于那些会遇到和我一样的问题的人。 It will be easy to adapt these code for other kernels.将这些代码改编为其他内核将很容易。

# Visualising the Training set results
from matplotlib.colors import ListedColormap
X_set, y_set = X_train_std, y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 1, stop = X_set[:, 0].max() + 1, step = 0.01),
                     np.arange(start = X_set[:, 1].min() - 1, stop = X_set[:, 1].max() + 1, step = 0.01))
plt.contourf(X1, X2, clf_svm_l.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('darkblue', 'yellow')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1],
                c = ListedColormap(('blue', 'gold'))(i), label = j)
plt.title('SVM (Training set)')
plt.xlabel('Easting')
plt.ylabel('Northing')
plt.legend()
plt.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM