[英]Scikit learn ComplementNB is outputting NaN for scores
我有一个包含 23 个特征的不平衡二进制数据集,92000 行标记为 0,207,000 行标记为 1。
我在这个数据集上训练了模型,例如 GaussianNB、DecisionTreeClassifier,以及来自 scikit learn 的其他几个分类器,它们都运行良好。
我想在这个数据集上运行 ComplementNB,但是当我这样做时,所有的分数都以 NaN 的形式出现。
下面是我的代码:
from sklearn.naive_bayes import ComplementNB
features = [
# Chest accelerometer sensor
'chest_accel_x', 'chest_accel_y', 'chest_accel_z',
# ECG (2 leads)
'ecg_1', 'ecg_2',
# Left ankle sensors
'left_accel_x', 'left_accel_y', 'left_accel_z',
'left_gyro_x', 'left_gyro_y', 'left_gyro_z',
'left_mag_x', 'left_mag_y', 'left_mag_z',
# Right lower arm sensors
'right_accel_x', 'right_accel_y', 'right_accel_z',
'right_gyro_x', 'right_gyro_y', 'right_gyro_z',
'right_mag_x', 'right_mag_y', 'right_mag_z',
]
df = pd.read_csv('mhealth_s_m.csv')
X = df[features]
y = df['label']
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size = 0.2, random_state = 69)
def K_fold_unbalanced(train_X, train_y):
scoring = ['accuracy', 'f1', 'precision', 'recall', 'roc_auc']
print('Unbalanced Data')
model = ComplementNB()
start_time = time.time()
scores = cross_validate(model, train_X, train_y, scoring = scoring, cv = 5, return_train_score = True)
print(scores)
print('Took', time.time() - start_time, 'to run')
print('=======================================')
K_fold_unbalanced(train_X, train_y)
Output 是:
train accuracy nan
train f1 nan
train precision nan
train recall nan
train roc auc nan
test accuracy nan
test f1 nan
test precision nan
test recall nan
test roc auc nan
Took 0.12271976470947266 to run
知道为什么所有值都是 NaN 吗? 我的数据可以在这里找到
这修复了它:
from sklearn.preprocessing import MinMaxScale
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.