如何在seaborn中绘制groupby作为百分比？

Question

I have a binary classification problem, which I want to solve with a RandomForestClassifier. 我有一个二进制分类问题，我想用RandomForestClassifier解决它。 My target column is 'successful' which is either 0 or 1. I want to investigate the data, and see how it looks like. 我的目标列是'成功'，它是0或1.我想调查数据，看看它是什么样的。 For that I tried to do count plots by category. 为此，我尝试按类别计算情节。 But it's not saying how much in percentage from total are 'successful' (ie successful == 1) 但并不是说总数的百分比是“成功的”（即成功== 1）

How can I change the following plot, so that these subplots display the percentage of (successful == 1) of total of all posts? 如何更改以下图表，以便这些子图显示所有帖子总数（成功== 1）的百分比？ (Let's say in category weekday, in day 'Saturday' I have 10 datapoints, 7 of them are successful ('successful' == 1), so I want to have a bar with points at that day at 0.7. （假设在工作日的类别中，在'星期六'那天我有10个数据点，其中7个是成功的（'成功'== 1），所以我想在那天有一个点数为0.7的酒吧。

Here is the actual plot (counts :-/): 这是实际的情节（计数： - /）：

And here is a part of my dataframe: 这是我的数据帧的一部分：

And here is the actual code used to generate the actual plot: 以下是用于生成实际情节的实际代码：

# Plot 

sns.set(style="darkgrid")

x_vals = [['page_name', 'weekday'],['type', 'industry']]
subtitles = [['by Page', 'by Weekday'],['by Content Type', 'by Industry']]

fig, ax = plt.subplots(2,2, figsize=(15,10))
#jitter = [[False, 1], [0.5, 0.2]]

for j in range(len(ax)):
    for i in range(len(ax[j])):
        ax[j][i].tick_params(labelsize=15)
        ax[j][i].set_xlabel('label', fontsize=17, position=(.5,20))
        if (j == 0) :
            ax[j][i].tick_params(axis="x", rotation=50) 
        ax[j][i].set_ylabel('label', fontsize=17)
        ax[j][i] = sns.countplot(x=x_vals[j][i], hue="successful", data=mainDf, ax=ax[j][i])

for j in range(len(ax)):
    for i in range(len(ax[j])):
        ax[j][i].set_xlabel('', fontsize=17)
        ax[j][i].set_ylabel('count', fontsize=17)
        ax[j][i].set_title(subtitles[j][i], fontsize=18)

fig.suptitle('Success Count by Category', position=(.5,1.05), fontsize=20)

fig.tight_layout()
fig.show()

PS: Please not, I am using Seaborn. PS：请不要，我正在使用Seaborn。 Solution should be also with Seaborn, if possible. 如果可能的话，解决方案也应该是Seaborn。 Thanks! 谢谢！

Answer 1

You can use barplot here. 你可以在这里使用barplot 。 I wasn't 100% sure of what you actually want to achieve so I developed several solutions. 我不是100％确定你真正想要实现的目标，所以我开发了几种解决方案。

Frequency of successful (unsuccessful) per total successful (unsuccessful) 成功（不成功）每次成功的频率（不成功）

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

mainDf['frequency'] = 0 # a dummy column to refer to
for col, ax in zip(['page_name', 'weekday', 'type', 'industry'], axes.flatten()):
    counts = mainDf.groupby([col, 'successful']).count()
    freq_per_group = counts.div(counts.groupby('successful').transform('sum')).reset_index()
    sns.barplot(x=col, y='frequency', hue='successful', data=freq_per_group, ax=ax)

Frequency of successful (unsuccessful) per group 每组成功（不成功）的频率

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

mainDf['frequency'] = 0 # a dummy column to refer to
for col, ax in zip(['page_name', 'weekday', 'type', 'industry'], axes.flatten()):
    counts = mainDf.groupby([col, 'successful']).count()
    freq_per_group = counts.div(counts.groupby(col).transform('sum')).reset_index()
    sns.barplot(x=col, y='frequency', hue='successful', data=freq_per_group, ax=ax)

which, based on the data you provided, gives 根据您提供的数据，给出

Frequency of successful (unsuccessful) per total 每次成功（不成功）的频率

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

mainDf['frequency'] = 0 # a dummy column to refer to
total = len(mainDf)
for col, ax in zip(['page_name', 'weekday', 'type', 'industry'], axes.flatten()):
    counts = mainDf.groupby([col, 'successful']).count()
    freq_per_total = counts.div(total).reset_index()
    sns.barplot(x=col, y='frequency', hue='successful', data=freq_per_total, ax=ax)

Answer 2

Change the line ax[j][i] = sns.countplot(x=x_vals[j][i], hue="successful", data=mainDf, ax=ax[j][i]) to ax[j][i] = sns.barplot(x=x_vals[j][i], y='successful', data=mainDf, ax=ax[j][i], ci=None, estimator=lambda x: sum(x) / len(x) * 100) 将行ax[j][i] = sns.countplot(x=x_vals[j][i], hue="successful", data=mainDf, ax=ax[j][i])更改为ax[j][i] = sns.barplot(x=x_vals[j][i], y='successful', data=mainDf, ax=ax[j][i], ci=None, estimator=lambda x: sum(x) / len(x) * 100)

Your code would be 你的代码就是

sns.set(style="darkgrid")

x_vals = [['page_name', 'weekday'],['type', 'industry']]
subtitles = [['by Page', 'by Weekday'],['by Content Type', 'by Industry']]

fig, ax = plt.subplots(2,2, figsize=(15,10))
#jitter = [[False, 1], [0.5, 0.2]]

for j in range(len(ax)):
    for i in range(len(ax[j])):
        ax[j][i].tick_params(labelsize=15)
        ax[j][i].set_xlabel('label', fontsize=17, position=(.5,20))
        if (j == 0) :
            ax[j][i].tick_params(axis="x", rotation=50) 
        ax[j][i].set_ylabel('label', fontsize=17)
        ax[j][i] = sns.barplot(x=x_vals[j][i], y='successful', data=mainDf, ax=ax[j][i], ci=None, estimator=lambda x: sum(x) / len(x) * 100)

for j in range(len(ax)):
    for i in range(len(ax[j])):
        ax[j][i].set_xlabel('', fontsize=17)
        ax[j][i].set_ylabel('percent', fontsize=17)
        ax[j][i].set_title(subtitles[j][i], fontsize=18)

fig.suptitle('Success Percentage by Category', position=(.5,1.05), fontsize=20)

fig.tight_layout()
fig.show()

如何在seaborn中绘制groupby作为百分比？

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-12-14 16:44:34

解决方案2
0 2018-12-14 17:04:34

如何在seaborn中绘制groupby作为百分比？

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-12-14 16:44:34

解决方案2 0 2018-12-14 17:04:34

解决方案1
2 已采纳 2018-12-14 16:44:34

解决方案2
0 2018-12-14 17:04:34