简体   繁体   中英

Random Undersampling with relative rather than absolute ratios

The function imblearn.under_sampling.RandomUnderSampler will only allow me to input the desired percentage of undersampling as absolute numbers via a dict however absolute numbers interfere with (time-series) cross validation, where I do not have the same level of the minority class samples for every fold. (Which creates constant errors: Originally, there is 11037 samples and 28546 samples are asked. )

Is there any way to input relative values, ie 80% for class 0 and 20% for class 1, etc.?

I think this is a minimal working example. That solves your problem. from collections import Counter from sklearn.datasets import make_classification from sklearn.model_selection import KFold from sklearn.pipeline import Pipeline from imblearn.under_sampling import RandomUnderSampler

def classify(datasets, labels, *args):
    kf=KFold(n_splits=3)
    for train_idx, test_idx in kf.split(X):
        print('Original dataset shape {}'.format(Counter(labels[train_idx])))
        train_x, train_y = datasets[train_idx], labels[train_idx]
        test_x, test_y = datasets[test_idx], labels[test_idx]
        ratio_dict = {}
        for k,v in enumerate(args):
            ratio_dict.update({k:int ((v/ 100) * Counter(train_y)[k])})
        print(ratio_dict)
        rus = RandomUnderSampler(random_state=42, ratio=ratio_dict)
        X_res, y_res = rus.fit_sample(X, y)
        print('Resampled dataset shape {}'.format(Counter(y_res)))


X, y = make_classification(n_classes=2, class_sep=2,
    weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0,
    n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10)

rus = classify(X, y, 10, 20)

I would love to see if somebody can implement this using sklearn's Pipeline Framework.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM