简体   繁体   English

多类分类图?

[英]Plots for multi-class classification?

I was looking through the scikit-learn documentation and realized that most of the plotting curve support exists fro binary classification. 我浏览了scikit-learn文档,并意识到大多数绘图曲线支持都来自二进制分类。

I was wanting to plot the precision_recall curve and the learning curve for my multi-class classifier. 我想为我的多分类器绘制precision_recall曲线和学习曲线。

model1=LogisticRegression()
y_d = model1.predict_proba(matrix_test)

I was wondering if there exists any method for plotting the precision_recall curve and learning curve for my classifer with a sparse matrix_test of size (22428,22000) and labels being a np array of size (22428,)? 我想知道是否存在任何方法来为我的分类器绘制precision_recall曲线和学习曲线,其中的稀疏matrix_test大小为(22428,22000),而标签为大小为np的数组(22428,)?

If you look at the definiton of Precision and Recall , you can see that there is an asymmetry there that does not directly translation into higher dimensions. 如果您看一下Precision和Recall的定义 ,您会发现那里存在一个不对称性,不能直接转换为更高的维度。 Say the classes are C 1 and C 2 , then arbitrarily , one of them is considered "True", and the other "False" (you can also notice that there is no symmetry: reversing "True" and "False" will not give the same results). 假设类别为C 1C 2 ,然后任意地将它们中的一个视为“ True”,将另一个视为“ False”(您还可以注意到没有对称性:将“ True”和“ False”反转将不会给出相同的结果)。 In higher dimensions, this simply can't be done directly. 在更高的维度上,这根本无法直接完成。

There are many ways to come up with heuristic extensions, though. 但是,有很多方法可以进行启发式扩展。 Say your classes are C 1 , ..., C m . 假设您的班级是C 1 ,...,C m You can calculate m precisions and recalls from the view point of class C i , then take a (weighted) average. 您可以从类C i的角度计算m个精度和召回率,然后取(加权)平均值。 The weights should probably reflect the importance of the classes. 权重可能应该反映班级的重要性。

Note that this is exactly the scheme used for the binary case, where the weight for the "True" class is chosen to be 1, and the weight for the "False" class is chosen to be 0 (again, emphasizing the arbitrary asymmetry of this score). 请注意,这恰好是用于二进制情况的方案,其中“ True”类的权重选择为1,而“ False”类的权重选择为0(再次,强调了这个分数)。

In terms of implementation, this is trivial. 就实现而言,这是微不足道的。 Say your confusion matrix is m . 假设您的混淆矩阵为m Then from the point of view of class i , the precision is m[i, i] / np.sum(m[:, i]) and the recall is m[i, i] / np.sum(m[i, :]) . 然后从类i的角度来看,精度为m[i, i] / np.sum(m[:, i]) ,召回率为m[i, i] / np.sum(m[i, :])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM