简体   繁体   中英

Pandas and matplotlib stacked bar chart with major and minor x-ticks grouped together

I have the following data:

id, approach, outcome
a1, approach1, outcome1
a1, approach1, outcome2
a1, approach1, outcome2
a1, approach1, outcome2
a1, approach1, outcome2
a1, approach2, outcome1
a1, approach2, outcome1
a1, approach2, outcome1
a1, approach2, outcome1
a1, approach2, outcome1
a1, approach3, outcome1
a1, approach3, outcome1
a1, approach3, outcome1
a1, approach3, outcome1
a1, approach3, outcome1
a2, approach1, outcome2
a2, approach1, outcome1
a2, approach1, outcome1
a2, approach1, outcome2
a2, approach1, outcome1
a2, approach2, outcome1
a2, approach2, outcome1
a2, approach2, outcome2
a2, approach2, outcome1
a2, approach2, outcome2
a2, approach3, outcome2
a2, approach3, outcome2
a2, approach3, outcome1
a2, approach3, outcome2
a2, approach3, outcome1

I found the following chart from another user which is exactly what I am looking to accomplish: 在此处输入图片说明

But instead of fruits we have ids and instead of years we have approaches.

Here is what I have done so far:

df = pandas.read_csv("test.txt", sep=r',\s+', engine = "python")
fig, ax = plt.subplots(1, 1, figsize=(5.5, 4))

data = df[df.approach == "approach1"].groupby(["id", "outcome"], sort=False)["outcome"].count().unstack(level=1)
data.plot.bar(width=0.5, position=0.6, color=["g", "r"], stacked=True, ax=ax)

data = df[df.approach == "approach2"].groupby(["id", "outcome"], sort=False)["outcome"].count().unstack(level=1)
data.plot.bar(width=0.5, position=-0.6, color=["g", "r"], stacked=True, ax=ax)

# "Activate" minor ticks
ax.minorticks_on()

rects_locs = []
p = 0
for patch in ax.patches:
    rects_locs.append(patch.get_x() + patch.get_width())
    # p += 0.01

# Set minor ticks there
ax.set_xticks(rects_locs, minor = True)

# Labels for the rectangles
new_ticks = ["Approach1"] * 10 + ["Approach2"] * 10

# Set the labels
from matplotlib import ticker
ax.xaxis.set_minor_formatter(ticker.FixedFormatter(new_ticks))  #add the custom ticks

# Move the category label further from x-axis
ax.tick_params(axis='x', which='major', pad=15)

# Remove minor ticks where not necessary
ax.tick_params(axis='x',which='both', top='off')
ax.tick_params(axis='y',which='both', left='off', right = 'off')
plt.xticks(rotation=0)

But the output is not nice: 在此处输入图片说明

So basically I want to have id as the major x-tick (so there should be 2 such x values) and then for each id there should be 3 grouped stacked bars (approach1, approach2, approach3).

Well, I'm not proud of it. But it works. Hopefully somebody more knowledgeable will come along with a better solution.

I start by setting up your data:

import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import numpy as np
import pandas as pd

data = np.array([
'id', 'approach', 'outcome',
'a1', 'approach1', 'outcome1',
'a1', 'approach1', 'outcome2',
'a1', 'approach1', 'outcome2',
'a1', 'approach1', 'outcome2',
'a1', 'approach1', 'outcome2',
'a1', 'approach2', 'outcome1',
'a1', 'approach2', 'outcome1',
'a1', 'approach2', 'outcome1',
'a1', 'approach2', 'outcome1',
'a1', 'approach2', 'outcome1',
'a1', 'approach3', 'outcome1',
'a1', 'approach3', 'outcome1',
'a1', 'approach3', 'outcome1',
'a1', 'approach3', 'outcome1',
'a1', 'approach3', 'outcome1',
'a2', 'approach1', 'outcome2',
'a2', 'approach1', 'outcome1',
'a2', 'approach1', 'outcome1',
'a2', 'approach1', 'outcome2',
'a2', 'approach1', 'outcome1',
'a2', 'approach2', 'outcome1',
'a2', 'approach2', 'outcome1',
'a2', 'approach2', 'outcome2',
'a2', 'approach2', 'outcome1',
'a2', 'approach2', 'outcome2',
'a2', 'approach3', 'outcome2',
'a2', 'approach3', 'outcome2',
'a2', 'approach3', 'outcome1',
'a2', 'approach3', 'outcome2',
'a2', 'approach3', 'outcome1'])

data = data.reshape(data.size // 3, 3)

df = pd.DataFrame(data[1:], columns=data[0])

Next I count up all the occurrences of "outcome1" and "outcome2" for each approach and id. (I'm sure this could be done directly in pandas, but I'm a bit of a pandas novice):

dict = {}

for id in 'a1', 'a2':
    dict[id] = {}
    for approach in 'approach1', 'approach2', 'approach3':
        dict[id][approach] = {}
        for outcome in 'outcome1', 'outcome2':
            dict[id][approach][outcome] = ((df['id'] == id)
                                         & (df['approach'] == approach)
                                         & (df['outcome'] == outcome)).sum()

plot_data = pd.DataFrame(dict)

Now all that is left is to do the plotting.

fig, ax = plt.subplots(1, 1)

i = 0
for id in 'a1', 'a2':
    for approach in 'approach1', 'approach2', 'approach3':
        ax.bar(i, plot_data[id][approach]["outcome1"], color='g')
        ax.bar(i, plot_data[id][approach]["outcome2"],
               bottom=plot_data[id][approach]["outcome1"], color='r')
        i += 1
    i+=1

ax.set_xticklabels(['', 'approach1', 'approach2', 'approach3', '',
                    'approach1', 'approach2', 'approach3'], rotation=45)

custom_lines = [Line2D([0], [0], color='g', lw=4),
                Line2D([0], [0], color='r', lw=4)]

ax.legend(custom_lines, ['Outcome 1', 'Outcome 2'])

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM