將 SMV 'ovo' 與 RandomUnderSampler 一起使用

Question

我正在使用不平衡的數據集進行分類。 我知道 sklearn 的SVM確實有一個decision_function_shape超參數，可以將其設置為'ovo'表示“一對一”（盡管SVM默認情況下是一個'ovo' ）。

由於我選擇使用欠采樣進行'ovo' ，因此我有興趣在擬合每個'ovo'模型之前將'ovo'參與類中的多數類下采樣到少數類的大小。 為了清楚起見，假設我有以下 4 個類的數據集：

from sklearn.datasets import make_classification
from collections import Counter

X, y = make_classification(n_samples=1000,  n_classes=4, 
            weights=[.1, .15, .2], n_informative=3, random_state=11)

Counter(y)
Counter({0: 103, 1: 151, 2: 200, 3: 546})

在SVM 'ovo'決策函數中，會有nC2即4C2 = 6模型。 因此，在每個'ovo'模型中，多數類欠采樣應該是這樣的：

Model 1 = Class 0 Vs Class 1 # maj:1=151; RUS to 103, -> 0:103, 1:103 
Model 2 = Class 0 Vs Class 2 # maj:2=200; RUS to 103  -> 0:103, 2:103
Model 3 = Class 0 Vs Class 3 # maj:3=546; RUS to 103, -> 0:103, 3:103
Model 4 = Class 1 Vs Class 2 # maj:2=200; RUS to 151  -> 1:151, 2:151
Model 5 = Class 1 Vs Class 3 # maj:3=546; RUS to 151  -> 1:151, 3:151
Model 6 = Class 2 Vs Class 3 # maj:3=546; RUS to 200  -> 2:200, 3:200

准確地說，每個參與類中的示例數量（平衡）取決於少數示例的數量。

如何將此與策略與sklearn的SVC()和imblearn的RUS()相結合？

Answer 1

我懷疑使用SVC的多類處理是否可以輕松完成，因為這似乎已委托給libsvm 。

您大概可以使用OneVsOneClassifier ，它的estimator是一個包含采樣器和SVC的imblearn管道（現在只會看到二進制問題）。

將 SMV 'ovo' 與 RandomUnderSampler 一起使用

問題描述

1 個解決方案

解決方案1
2 2022-06-22 03:41:49

將 SMV &#39;ovo&#39; 與 RandomUnderSampler 一起使用

問題描述

1 個解決方案

解決方案1 2 2022-06-22 03:41:49

將 SMV 'ovo' 與 RandomUnderSampler 一起使用

解決方案1
2 2022-06-22 03:41:49