[英]imblearn smote+enn under sampled the majority class
我有一個不平衡的數據集,當我嘗試使用SMOTEENN平衡他時,多數階層的人數減少了一半
我嘗試使用提供的所有選項更改“ sampling_strategy”參數,但無濟於事
from imblearn.combine import SMOTEENN
sme = SMOTEENN()
X_res, y_res = sme.fit_resample(X_train, y_train)
print(f'Original train dataset shape: {Counter(y_train)}')
# Original train dataset shape: Counter({1: 2194, 0: 205})
print(f'Resampled train dataset shape: {Counter(y_res)}\n')
# Resampled train dataset shape: Counter({0: 2117, 1: 1226})
如果您查看SMOTEENN文檔( https://imbalanced-learn.readthedocs.io/en/stable/generation/imblearn.combine.SMOTEENN.html#imblearn.combine.SMOTEENN ):
使用SMOTE和已編輯最近鄰居將過采樣和欠采樣結合起來。
如果要為每個類獲取偶數,則可以嘗試使用其他技術,例如over_sampling.SMOTE
例如:
from sklearn.datasets import make_classification
from imblearn.combine import SMOTEENN
from imblearn.over_sampling import SMOTE
from collections import Counter
X, y = make_classification(n_samples=5000, n_features=2, n_informative=2,
n_redundant=0, n_repeated=0, n_classes=2,
n_clusters_per_class=1,
weights=[0.06, 0.94],
class_sep=0.1, random_state=0)
sme = SMOTEENN()
X_res, y_res = sme.fit_resample(X, y)
print(f'Original train dataset shape: {Counter(y)}')
# Original train dataset shape: Counter({1: 4679, 0: 321})
print(f'Resampled train dataset shape: {Counter(y_res)}\n')
# Resampled train dataset shape: Counter({0: 3561, 1: 3246})
sme = SMOTE()
X_res, y_res = sme.fit_resample(X, y)
print(f'Original train dataset shape: {Counter(y)}')
# Original train dataset shape: Counter({1: 4679, 0: 321})
print(f'Resampled train dataset shape: {Counter(y_res)}\n')
# Resampled train dataset shape: Counter({0: 4679, 1: 4679})
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.