簡體   English   中英

LinearSVC:將兩個類與散點圖和 pandas DataFrame 分開的直線方程

[英]LinearSVC: Equation of a straight line that separates two classes from a scatterplot graph and pandas DataFrame

我正在嘗試創建一條分隔兩個類的直線。 我正在使用帶有散點圖的熊貓 dataframe。

在讓您解決問題之前,這是我的代碼:

圖書館:

import pandas as pd
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.svm import LinearSVC
from sklearn.metrics import ConfusionMatrixDisplay
from scipy.io import arff

數據:

arquivo_arff = arff.loadarff(r"/content/Rice_MSC_Dataset.arff")
dados = pd.DataFrame(arquivo_arff[0])

篩選:

dados = dados[['MINOR_AXIS', 'MAJOR_AXIS', 'CLASS']]

另一個過濾器:

dados = dados[dados['CLASS'].isin([b"Arborio", b"Ipsala"])]

帶有兩個參數的圖形:

sns.scatterplot(
    data=dados, 
    x="MINOR_AXIS", 
    y="MAJOR_AXIS", 
    hue="CLASS")
plt.show()

我的問題就在這里,當我使用 LinearSVC 來查找方程的 que 參數和系數時:

model = LinearSVC()
model.fit(dados.drop('CLASS', axis=1), dados['CLASS'])

a, b = model.coef_[0]
d = model.intercept_[0]

print('a:', a)
print('b:', b)
print('d:', d)

You appear to be using a legacy multi-label data representation. Sequence of sequences are no longer supported; use a binary array or sparse matrix instead - the MultiLabelBinarizer transformer can convert to this format.

我不太了解那個錯誤。 有什么方法可以在我的代碼中解決這個問題嗎?

multilabelbinarizer 的文檔有一些特定用途的好例子,但 sklearn 轉換器的一般工作流程是:

將數據拆分為特征和標簽

X = dados.drop('CLASS', axis=1)
y = dados['CLASS']

#optionally, use train_test_split to split data into training and validation sets
#X_train,X_test,y_train,y_test=train_test_split(X,y)

對輸入和目標數據進行轉換

mb = MultiLabelBinarizer()
mb.fit(y)
mb.transform(y)
#can also be done in one step with mb.fit_transform(y)
#if using train_test_split: mb.fit_transform(y_train); mb.transform(y_test)

適合您的 model

model = LinearSVC()
model.fit(X,y) #or model.fit(X_train,y_train) if using training and validation sets

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM