[英]Scikit learn ComplementNB is outputting NaN for scores
我有一個包含 23 個特征的不平衡二進制數據集,92000 行標記為 0,207,000 行標記為 1。
我在這個數據集上訓練了模型,例如 GaussianNB、DecisionTreeClassifier,以及來自 scikit learn 的其他幾個分類器,它們都運行良好。
我想在這個數據集上運行 ComplementNB,但是當我這樣做時,所有的分數都以 NaN 的形式出現。
下面是我的代碼:
from sklearn.naive_bayes import ComplementNB
features = [
# Chest accelerometer sensor
'chest_accel_x', 'chest_accel_y', 'chest_accel_z',
# ECG (2 leads)
'ecg_1', 'ecg_2',
# Left ankle sensors
'left_accel_x', 'left_accel_y', 'left_accel_z',
'left_gyro_x', 'left_gyro_y', 'left_gyro_z',
'left_mag_x', 'left_mag_y', 'left_mag_z',
# Right lower arm sensors
'right_accel_x', 'right_accel_y', 'right_accel_z',
'right_gyro_x', 'right_gyro_y', 'right_gyro_z',
'right_mag_x', 'right_mag_y', 'right_mag_z',
]
df = pd.read_csv('mhealth_s_m.csv')
X = df[features]
y = df['label']
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size = 0.2, random_state = 69)
def K_fold_unbalanced(train_X, train_y):
scoring = ['accuracy', 'f1', 'precision', 'recall', 'roc_auc']
print('Unbalanced Data')
model = ComplementNB()
start_time = time.time()
scores = cross_validate(model, train_X, train_y, scoring = scoring, cv = 5, return_train_score = True)
print(scores)
print('Took', time.time() - start_time, 'to run')
print('=======================================')
K_fold_unbalanced(train_X, train_y)
Output 是:
train accuracy nan
train f1 nan
train precision nan
train recall nan
train roc auc nan
test accuracy nan
test f1 nan
test precision nan
test recall nan
test roc auc nan
Took 0.12271976470947266 to run
知道為什么所有值都是 NaN 嗎? 我的數據可以在這里找到
這修復了它:
from sklearn.preprocessing import MinMaxScale
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.