简体   繁体   English

Pandas - pd.crosstab 给了我不正确的分类

[英]Pandas - pd.crosstab gives me incorrect classification

I built a classifier and wanted to try out pd.crosstab .我构建了一个分类器并想尝试pd.crosstab However, it seems to give me incorrect numbers of total elements which is confusing and I can't figure out why.但是,它似乎给了我不正确的总元素数,这令人困惑,我不知道为什么。

Actual code:实际代码:

df_confusion = pd.crosstab(pd.Series(y_pred), pd.Series(y_test), 
                           rownames=['Predicted'], colnames= ['Actual'],
                           margins=True)

Typing in jupyter notebook: df_confusion yields在 jupyter notebook 中输入: df_confusion产生

**Actual**   0.0      1.0   **All**

**Predicted**   

**0.0**    6529     1951        8480

**1.0**     718     208         926

**All**     7247    2159        9406**

whereas the total number of elements of each category 0 and 1 in y_pred and y_test are as follows:y_predy_test中每个类别0和1的元素总数如下:

sum(y_pred==0) equals 34264
sum(y_pred==1) equals 3514

sum(y_test==1) equals 34259
sum(y_test==0) equals 3519

However importing confusion_matrix yields expected answers然而,导入混淆矩阵会产生预期的答案

from sklearn.metrics  import confusion_matrix

confusion_matrix(y_test,y_pred)
array([[34259,     0],
       [    5,  3514]], dtype=int64)

I had a similar problem, the total number of elements in the confusion matrix didn't match the number of elements in the series.我有一个类似的问题,混淆矩阵中的元素总数与系列中的元素数不匹配。 It was caused by wrong indices.这是由错误的索引引起的。 One of my series was created by dropping rows from a dataframe, while the other was created in a loop.我的系列之一是通过从数据框中删除行创建的,而另一个是在循环中创建的。 So the indices in the 2 series didn't match.所以2系列中的指数不匹配。 I had to apply reset_index(drop=True) on the first series.我不得不在第一个系列上应用reset_index(drop=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM