如何更改 LogisticRegression 擬合 function 的輸入形狀？

Question

我正在嘗試將 LogisticRegression 應用於我的數據集。

我已將數據拆分為訓練、測試和驗證。 使用一種熱編碼對數據進行標准化。 我正進入（狀態

ValueError: bad input shape (527, 2)

這是我的代碼：

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression

#read the data
train_data = pd.read_csv('ip2ttt_train.data',header=None)
test_data = pd.read_csv('ip2ttt_test.data', header=None)
valid_data = pd.read_csv('ip2ttt_valid.data', header=None)

#for valid dataset
valid_label = valid_data[9]
valid_features = valid_data.drop(columns =9) 

#for test dataset
test_label = test_data[9] 
test_features = test_data.drop(columns =9)

#for train dataset
train_label = train_data[9] 
train_features = train_data.drop(columns =9)

X_valid = pd.get_dummies(valid_features)
y_valid = pd.get_dummies(valid_label)

X_test = pd.get_dummies(test_features)
y_test = pd.get_dummies(test_label)

X_train = pd.get_dummies(train_features)
y_train = pd.get_dummies(train_label)

clf = LogisticRegression(random_state=0, multi_class='multinomial', solver='newton-cg', penalty='l2') #penalty = L1 or L2 and solver = newton-cg or lbfgs

clf.fit(X_train, y_train)

這是 X 和 y 的形狀：

X_train.shape
(527, 27)
y_train.shape
(527, 2)

我嘗試了什么：

我發現我需要改變y_train的形狀。 我嘗試將y_train轉換為np.array並flatten() ，但它不起作用。 我想我需要(527,1)形狀。 我也試過reshape([527,1])但它給了我一個錯誤。 我知道

y: 類似數組的形狀 (n_samples,)

相對於 X 的目標向量。

但不知道如何正確實施。

更新： train_label的樣本數據：

0      positive
1      positive
2      positive
3      positive
4      positive
         ...   
522    negative
523    negative
524    negative
525    negative
526    negative
Name: 9, Length: 527, dtype: object

train_features的樣本數據

    0   1   2   3   4   5   6   7   8
0   x   x   x   x   o   o   x   o   o
1   x   x   x   x   o   o   o   x   o
2   x   x   x   x   o   o   b   o   b
3   x   x   x   x   o   b   o   o   b
4   x   x   x   x   b   o   o   b   o
... ... ... ... ... ... ... ... ... ...
522 x   o   x   o   o   x   x   x   o
523 o   x   x   x   o   o   x   o   x
524 o   x   x   x   o   o   o   x   x
525 o   x   o   x   x   o   x   o   x
526 o   x   o   x   o   x   x   o   x

我試圖在沒有一種熱編碼的情況下將它們輸入fit()並得到錯誤： ValueError: could not convert string to float: 'x'

Answer 1

使用一種熱編碼對數據進行標准化。

scikit-learn 的LogisticRegression不應該是這種情況； 正如引用的文件所說：

y: 類似數組的形狀 (n_samples,)

相對於 X 的目標向量。

您的所有標簽（訓練、驗證、測試）都需要一個(n_samples,)形狀。 您應該刪除所有用於定義y_train 、 y_valid和y_test的pd.get_dummies()命令，並分別使用train_label 、 valid_label和test_label 。

如何更改 LogisticRegression 擬合 function 的輸入形狀？

問題描述

1 個解決方案

解決方案1
2 已采納 2020-05-17 01:05:02

如何更改 LogisticRegression 擬合 function 的輸入形狀？

問題描述

1 個解決方案

解決方案1 2 已采納 2020-05-17 01:05:02

解決方案1
2 已采納 2020-05-17 01:05:02