简体   繁体   English

如何在 sklearn 中执行多类多标签分类?

[英]How to perform multiclass-multilabel classification in sklearn?

I have multiclass multioutput classification (see https://scikit-learn.org/stable/modules/multiclass.html for details).我有多multiclass multioutput分类(有关详细信息,请参阅https://scikit-learn.org/stable/modules/multiclass.html )。 In other words, my dataset looks as follows.换句话说,我的数据集如下所示。

node_name, feature1, feature2, ... label_1, label_2
node1,      1.2,        1.8, ...,     0,       2
node2,      1.0,        1.1, ...,     1,       1
node3,      1.9,        1.2, ...,     0,       3 
...
...
...

So, my label_1 could be either 0 or 1 , whereas my label_2 could be either 0, 1, or 2 .因此,我的 label_1 可以是0 or 1 ,而我的 label_2 可以是0, 1, or 2

Since I have two labels (ie label_1 and label_2), my question is how to fit these labels to the classifier in sklearn?由于我有两个标签(即 label_1 和 label_2),我的问题是如何将这些标签适合 sklearn 中的分类器?

In my current code I am using RandomForest as mentioned below.在我当前的代码中,我正在使用RandomForest ,如下所述。 However, I could not find a useful resource which describes how to turn the randomforest classifier into multiclass-multilabel classification.但是,我找不到有用的资源来描述如何将随机森林分类器转换为多类多标签分类。 If RandomForest does not support multiclass multilabel classificatoin, I am totally fine to move into other classifiers that supports them.如果 RandomForest 不支持多类多标签分类,我完全可以进入支持它们的其他分类器。 My current code is as follows.我当前的代码如下。

clf = RandomForestClassifier(random_state = 42, class_weight="balanced")
k_fold = StratifiedKFold(n_splits=10, shuffle=True, random_state=0)
scores = cross_validate(clf, X, y, cv=k_fold, scoring = ('accuracy', 'precision_weighted', 'recall_weighted', 'f1_weighted', 'roc_auc'))

I am happy to provide more details if needed.如果需要,我很乐意提供更多详细信息。

Looking at the link you provided (under the 'Support multiclass-multioutput:' list) and RandomForestClassifier (fit method parameters), it seems that RFC supports multiclass-multioutput out of the bag.查看您提供的链接(在“支持多类多输出:”列表下)和 RandomForestClassifier(适合方法参数),似乎 RFC 支持多类多输出。 All you need to do is format your y's correctly when you supply it to RFC.您需要做的就是在将 y 提供给 RFC 时正确格式化它。 It should be:它应该是:

y = np.array([['0', '2'], ['1', '1'], ['0', '3']])

for the first 3 nodes you provided.对于您提供的前 3 个节点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 多类-多标签分类的每类加权损失 - Per class weighted loss for multiclass-multilabel classification 使用 Sklearn 进行多标签文本分类 - Multilabel text classification with Sklearn 如何使用 lstm 执行多类多输出分类 - How to perform multiclass multioutput classification using lstm Sklearn多类别分类课程顺序 - Sklearn multiclass classification class order 如何为多类多输出数据生成 sklearn 分类报告 - How to generate sklearn classification report for multiclass multioutput data 如何修复 ValueError:分类指标无法处理模型的多类和多标签指标目标的混合? - How to fix ValueError: Classification metrics can't handle a mix of multiclass and multilabel-indicator targets for model? 如何处理 ValueError:分类指标无法处理多标签指标和多类目标错误的混合 - how to handle ValueError: Classification metrics can't handle a mix of multilabel-indicator and multiclass targets error Sklearn Linear SVM 无法在多标签分类中进行训练 - Sklearn Linear SVM cannot train in multilabel classification 如何使用sklearn.metrics计算多标签分类任务的微观/宏观指标? - How do I use sklearn.metrics to compute micro/macro measures for multilabel classification task? Sklearn的roc_auc_score用于多标签二进制分类 - Sklearn's roc_auc_score for multilabel binary classification
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM