简体   繁体   English

在多数类样本中获得了smote + enn

[英]imblearn smote+enn under sampled the majority class

I have an imbalanced dataset and when I try to balance him using SMOTEENN, the count of majority class decreasing by half 我有一个不平衡的数据集,当我尝试使用SMOTEENN平衡他时,多数阶层的人数减少了一半

I tried to change the 'sampling_strategy' parameter, with all the provided options but it not help 我尝试使用提供的所有选项更改“ sampling_strategy”参数,但无济于事

from imblearn.combine import SMOTEENN

sme = SMOTEENN()
X_res, y_res = sme.fit_resample(X_train, y_train)

print(f'Original train dataset shape: {Counter(y_train)}')
# Original train dataset shape: Counter({1: 2194, 0: 205})

print(f'Resampled train dataset shape: {Counter(y_res)}\n')
# Resampled train dataset shape: Counter({0: 2117, 1: 1226})

If you look at the documentation SMOTEENN ( https://imbalanced-learn.readthedocs.io/en/stable/generated/imblearn.combine.SMOTEENN.html#imblearn.combine.SMOTEENN ): 如果您查看SMOTEENN文档( https://imbalanced-learn.readthedocs.io/en/stable/generation/imblearn.combine.SMOTEENN.html#imblearn.combine.SMOTEENN ):

Combine over- and under-sampling using SMOTE and Edited Nearest Neighbours. 使用SMOTE和已编辑最近邻居将过采样和欠采样结合起来。

If you want to get an even number for each class you can try using other techniques like over_sampling.SMOTE 如果要为每个类获取偶数,则可以尝试使用其他技术,例如over_sampling.SMOTE

For example: 例如:

from sklearn.datasets import make_classification
from imblearn.combine import SMOTEENN
from imblearn.over_sampling import SMOTE
from collections import Counter

X, y = make_classification(n_samples=5000, n_features=2, n_informative=2,
                           n_redundant=0, n_repeated=0, n_classes=2,
                           n_clusters_per_class=1,
                           weights=[0.06, 0.94],
                           class_sep=0.1, random_state=0)


sme = SMOTEENN()
X_res, y_res = sme.fit_resample(X, y)

print(f'Original train dataset shape: {Counter(y)}')
# Original train dataset shape: Counter({1: 4679, 0: 321})

print(f'Resampled train dataset shape: {Counter(y_res)}\n')
# Resampled train dataset shape: Counter({0: 3561, 1: 3246})

sme = SMOTE()
X_res, y_res = sme.fit_resample(X, y)

print(f'Original train dataset shape: {Counter(y)}')
# Original train dataset shape: Counter({1: 4679, 0: 321})

print(f'Resampled train dataset shape: {Counter(y_res)}\n')
# Resampled train dataset shape: Counter({0: 4679, 1: 4679})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM