简体   繁体   English

Scikit learn ComplementNB 正在为分数输出 NaN

[英]Scikit learn ComplementNB is outputting NaN for scores

I have an unbalanced binary dataset with 23 features, 92000 rows are labeled 0, and 207,000 rows are labeled 1.我有一个包含 23 个特征的不平衡二进制数据集,92000 行标记为 0,207,000 行标记为 1。

I trained models on this dataset such as GaussianNB, DecisionTreeClassifier, and a few more classifiers from scikit learn, and they all work fine.我在这个数据集上训练了模型,例如 GaussianNB、DecisionTreeClassifier,以及来自 scikit learn 的其他几个分类器,它们都运行良好。

I want to run ComplementNB on this dataset, but when i do so, all the scores are coming out as NaN.我想在这个数据集上运行 ComplementNB,但是当我这样做时,所有的分数都以 NaN 的形式出现。

Below is my code:下面是我的代码:

from sklearn.naive_bayes import ComplementNB
features = [
            # Chest accelerometer sensor
            'chest_accel_x', 'chest_accel_y', 'chest_accel_z',
    
            # ECG (2 leads)
            'ecg_1', 'ecg_2',

            # Left ankle sensors
            'left_accel_x', 'left_accel_y', 'left_accel_z',
            'left_gyro_x', 'left_gyro_y', 'left_gyro_z',
            'left_mag_x', 'left_mag_y', 'left_mag_z',

            # Right lower arm sensors
            'right_accel_x', 'right_accel_y', 'right_accel_z',
            'right_gyro_x', 'right_gyro_y', 'right_gyro_z',
            'right_mag_x', 'right_mag_y', 'right_mag_z',
        ]
df = pd.read_csv('mhealth_s_m.csv')
X = df[features]
y = df['label']
train_X, test_X, train_y, test_y = train_test_split(X, y, test_size = 0.2, random_state = 69)
def K_fold_unbalanced(train_X, train_y):
        scoring = ['accuracy', 'f1', 'precision', 'recall', 'roc_auc']
        print('Unbalanced Data')
        model = ComplementNB()
        start_time = time.time()
        scores = cross_validate(model, train_X, train_y, scoring = scoring, cv = 5, return_train_score = True)
        print(scores)
        print('Took', time.time() - start_time, 'to run')
        print('=======================================')
K_fold_unbalanced(train_X, train_y)

Output is: Output 是:

train accuracy nan 
 train f1 nan 
 train precision nan 
 train recall nan 
 train roc auc nan

test accuracy nan 
 test f1 nan 
 test precision nan 
 test recall nan 
 test roc auc nan
Took 0.12271976470947266 to run

Any ideas why all the values are NaN?知道为什么所有值都是 NaN 吗? My data can be found here我的数据可以在这里找到

this fixed it:这修复了它:

from sklearn.preprocessing import MinMaxScale

scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM