sklearn 如何模擬訓練集中不存在的標簽的零概率預測？

Question

我希望我的標題正確。

我運行多類分類（3 個類）。 並用 ROC AUC 對其進行評分。

make_scorer(roc_auc_score, needs_proba=True, average="macro", multi_class='ovo', labels=[-1, 0, 1])

我用時間序列分割器分割了訓練/測試數據，不能重新調整數據的順序（沒有分層參數）。

其中一個拆分不包含訓練數據中的“0”label。 所以.fit function 只能看到 2 個標簽，因此 predict_proba function 只有 2 列 output。

當我運行多類 ROC AUC 評分時，我得到這個 ValueError Number of given labels, 3, not equal to the number of columns in 'y_score', 2 。

我認為我可以接受 function 預測“0”class 的概率為零。所以我想添加一個模擬概率預測。 有沒有辦法在標准庫中做到這一點？

還有其他建議嗎？ 我想到了 1) 包裝 predict_proba，添加缺失的概率列 2) 更改時間序列拆分，以便如果訓練數據僅包含 2 個類 - 獲取更多訓練數據。

Answer 1

我將發布我最終所做的事情。 我做了一個包裝器 function，它返回帶有增強預測概率和增強擬合 function 的估計器（例如 LogisticRegression）的子項 class。擬合 function 保存它在 y_train 中看到的標簽。 predict_proba function 用零填充與 y_train 中不存在但存在於標簽中的標簽對應的列。

def predict_proba_wrapper(method: classmethod, labels: list):
    """Add zeros to the predict_proba columns if labels not present in y_true."""

    @wraps(method)
    def wrapper(self, *args, **kwargs ):
        # find labels indices not in y_train
        indices_to_fill = []
        for i, label in enumerate(labels):
            if label not in self.labels_seen:
                indices_to_fill.append(i)
        # call method
        y_pred = method(self, *args, **kwargs)
        # fill zeros
        if not isinstance(y_pred, np.ndarray):
            y_pred_np = np.array(y_pred)
        else:
            y_pred_np = y_pred

        for i in indices_to_fill:
            y_pred_np = np.insert(y_pred_np, i, 0., axis=1)

        if isinstance(y_pred, np.ndarray):
            return y_pred_np
        elif isinstance(y_pred, pd.DataFrame):
            return pd.DataFrame(y_pred_np, index=y_pred.index)
        elif isinstance(y_pred, pd.Series):
            return pd.Series(y_pred_np, index=y_pred.index)
        elif isinstance(y_pred, list):
            return y_pred_np.tolist()
        else:
            raise ValueError(f"y_pred type {type(y_pred)} not supported")

    return wrapper

def fit_wrapper(method: classmethod, labels: list):
    """Add labels seen to the class."""

    @wraps(method)
    def wrapper(self, *args, **kwargs ):
        res = method(self, *args, **kwargs)
        if len(args) >= 2:
            y = args[1]
        else:
            y = kwargs["y"]
        if isinstance(y, np.ndarray):
            self.labels_seen = list(np.unique(y))
        elif isinstance(y, pd.DataFrame):
            self.labels_seen = list(y.iloc[:, 0].unique())
        elif isinstance(y, pd.Series):
            self.labels_seen = list(y.unique())
        elif isinstance(y, list):
            self.labels_seen = list(set(y))
        else:
            raise ValueError(f"y type {type(y)} not supported")
        if hasattr(self, "classes_"):
            if isinstance(self.classes_, np.ndarray):
                self.classes_ = np.array(labels)
            elif isinstance(self.classes_, pd.DataFrame):
                self.classes_ = pd.DataFrame(labels)
            elif isinstance(self.classes_, pd.Series):
                self.classes_ = pd.Series(labels)
            elif isinstance(self.classes_, list):
                self.classes_ = labels
            else:
                raise ValueError(f"y type {type(y)} not supported")

        return res

    return wrapper


def class_child_with_wrapped_methods(class_: Type, method_names: List[str], wrappers: List[callable]):
    """Return a new class with a method wrapped by method wrapper."""
    new_class = type(class_.__name__ + "Wrapped", (class_,), {})
    for i, method_name in enumerate(method_names):
        setattr(new_class, method_name, wrappers[i](getattr(new_class, method_name)))
    return new_class


def wrap_fit_predict_proba(class_: Type, labels: list):
    """Return a new class with predict_proba wrapped by predict_proba_wrapper."""
    return class_child_with_wrapped_methods(
        class_,
        ["predict_proba", "fit"],
        [
            lambda x: predict_proba_wrapper(x, labels),
            lambda x: fit_wrapper(x, labels)
        ]
    )
    
CLASSIFIERS = [
    wrap_fit_predict_proba(LogisticRegression, labels[-1,0,1]),
    wrap_fit_predict_proba(ExtraTreesClassifier, labels=[-1,0,1]),
]

sklearn 如何模擬訓練集中不存在的標簽的零概率預測？

問題描述

1 個解決方案

解決方案1
0 已采納 2022-10-06 17:04:42

sklearn 如何模擬訓練集中不存在的標簽的零概率預測？

問題描述

1 個解決方案

解決方案1 0 已采納 2022-10-06 17:04:42

解決方案1
0 已采納 2022-10-06 17:04:42