[英]TypeError: train_test_split() got multiple values for argument 'test_size' only when I write it in a function
[英]TypeError: train_test_split() got an unexpected keyword argument 'test_size'
我正在嘗試使用隨機森林方法找到最佳特征集,我需要將數據集拆分為測試和訓練。 這是我的代碼
from sklearn.model_selection import train_test_split
def train_test_split(x,y):
# split data train 70 % and test 30 %
x_train, x_test, y_train, y_test = train_test_split(x, y,train_size=0.3,random_state=42)
#normalization
x_train_N = (x_train-x_train.mean())/(x_train.max()-x_train.min())
x_test_N = (x_test-x_test.mean())/(x_test.max()-x_test.min())
train_test_split(data,data_y)
參數 data,data_y 解析正確。 但我收到以下錯誤。 我想不通這是為什么。
您在代碼中使用的函數名稱與 sklearn.preprocessing 中的函數名稱相同,更改函數名稱即可完成這項工作。 像這樣的東西,
from sklearn.model_selection import train_test_split
def my_train_test_split(x,y):
# split data train 70 % and test 30 %
x_train, x_test, y_train, y_test = train_test_split(x,y,train_size=0.3,random_state=42)
#normalization
x_train_N = (x_train-x_train.mean())/(x_train.max()-x_train.min())
x_test_N = (x_test-x_test.mean())/(x_test.max()-x_test.min())
my_train_test_split(data,data_y)
說明:-盡管在python中有方法重載(即根據參數類型選擇同名函數),但在您的情況下,這兩個函數都需要相同類型的參數,因此不同的命名是唯一可能的解決方案海事組織。
另一種解決方案是重命名sklearn.model_selection
,它解決了sklearn.model_selection
和model_selection
(默認名稱)之間的沖突。
from sklearn.model_selection import train_test_split as sklearn_train_test_split
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.