简体   繁体   English

Pandas 并排堆积条形图

[英]Pandas side-by-side stacked bar plot

I want to create a stacked bar plot of the titanic dataset .我想创建一个泰坦尼克号数据集的堆积条形图。 The plot needs to group by "Pclass", "Sex" and "Survived".情节需要按“Pclass”、“Sex”和“Survived”分组。 I have managed to do this with a lot of tedious numpy manipulation to produce the normalized plot below (where "M" is male and "F" is female)我设法通过大量繁琐的 numpy 操作来生成下面的标准化图(其中“M”是男性,“F”是女性) 在此处输入图片说明

Is there a way to do this using pandas inbuilt plotting functionality?有没有办法使用熊猫内置的绘图功能来做到这一点?

I have tried this:我试过这个:

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('train.csv')
df_grouped = df.groupby(['Survived','Sex','Pclass'])['Survived'].count()
df_grouped.unstack().plot(kind='bar',stacked=True,  colormap='Blues', grid=True, figsize=(13,5));

在此处输入图片说明

Which is not what I want.这不是我想要的。 Is there anyway to produce the first plot using pandas plotting?反正有没有使用熊猫绘图生成第一个绘图? Thanks in advance提前致谢

The resulting bars will not neighbour each other as in your first figure, but outside of that, pandas lets you do what you want as follows:生成的条形不会像您的第一个图中那样彼此相邻,但除此之外,pandas 可让您按如下方式执行所需的操作:

df_g = df.groupby(['Pclass', 'Sex'])['Survived'].agg([np.mean, lambda x: 1-np.mean(x)])
df_g.columns = ['Survived', 'Died']
df_g.plot.bar(stacked=True)

在此处输入图片说明

Here, the horizontal grouping of patches is complicated by the requirement of stacking.这里,由于堆叠的要求,补丁的水平分组变得复杂。 If, for instance, we only cared about the value of "Survived", pandas could take care of it out-of-the-box.例如,如果我们只关心“Survived”的价值,pandas 可以开箱即用地处理它。

df.groupby(['Pclass', 'Sex'])['Survived'].mean().unstack().plot.bar()

在此处输入图片说明

If an ad hoc solution suffices for post-processing the plot, doing so is also not terribly complicated:如果临时解决方案足以对绘图进行后处理,那么这样做也不是非常复杂:

import numpy as np
from matplotlib import ticker

df_g = df.groupby(['Pclass', 'Sex'])['Survived'].agg([np.mean, lambda x: 1-np.mean(x)])
df_g.columns = ['Survived', 'Died']
ax = df_g.plot.bar(stacked=True)

# Move back every second patch
for i in range(6):
    new_x = ax.patches[i].get_x() - (i%2)/2
    ax.patches[i].set_x(new_x)
    ax.patches[i+6].set_x(new_x)

# Update tick locations correspondingly
minor_tick_locs = [x.get_x()+1/4 for x in ax.patches[:6]]
major_tick_locs = np.array([x.get_x()+1/4 for x in ax.patches[:6]]).reshape(3, 2).mean(axis=1)
ax.set_xticks(minor_tick_locs, minor=True)
ax.set_xticks(major_tick_locs)

# Use indices from dataframe as tick labels
minor_tick_labels = df_g.index.levels[1][df_g.index.labels[1]].values
major_tick_labels = df_g.index.levels[0].values
ax.xaxis.set_ticklabels(minor_tick_labels, minor=True)
ax.xaxis.set_ticklabels(major_tick_labels)

# Remove ticks and organize tick labels to avoid overlap
ax.tick_params(axis='x', which='both', bottom='off')
ax.tick_params(axis='x', which='minor', rotation=45)
ax.tick_params(axis='x', which='major', pad=35, rotation=0)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM