从多个类别创建分组/堆叠的条形图，其中包含熊猫数据框中的多个标签

Question

I have the following pandas dataframe ( df ) [ only an excerpt of the full dataframe ]: 我有以下pandas数据框（ df ）[ 仅是完整数据框的一部分 ]：

   Name    Cat_1    Cat_2
0   foo        P    Apples, Pears, Cats
1   bar     R, M    Apples
2   bla        E    Pears
3   blu        F    Cats, Pears
4   boo        G    Apples, Pears
5   faa     P, E    Apples, Cats

I would like to create bar plots that are build from Cat_1 and Cat_2 . 我想创建从Cat_1和Cat_2构建的Cat_2 。 These columns contain multiple tags, which have to be use for plotting. 这些列包含多个标记，这些标记必须用于绘图。

Currently, I am running this simple code to plot Cat_1 : 当前，我正在运行以下简单代码来绘制Cat_1 ：

import pandas as pd
from matplotlib import pyplot as plt

fig, ax = plt.subplots(figsize = (4,4))
s = df["Cat_1"].str.split(", ", expand = True).stack()
s.value_counts().plot(kind = 'bar', ax = ax)

This returns a nice bar plot for each of the different labels in Cat_1 allowing multiple assignments (as intended). 这会为Cat_1每个不同标签返回一个漂亮的条形图，允许进行多个分配（按预期进行）。

One could apply the same to Cat_2 and obtain a separate plot with the respective labels. 可以将相同的内容应用于Cat_2并获得带有相应标签的单独图。

However, I want to have a single plot that is first "stacked" by Cat_1 and subsequently the values are counted for Cat_2 . 但是，我希望有一个图，该图首先由Cat_1 “堆叠”，然后为Cat_2计算值。

I guess a way to think of this is to build a nested dictionary that would look like the following: 我想想办法是建立一个嵌套的字典，如下所示：

{"P": {"Apples": 2, "Pears": 1, "Cats": 2}, "R": {"Apples": 1}, ....}

but at the same time keep track of the total count of Cat_1 . 但同时要跟踪Cat_1的总数。

It does not matter whether its a grouped or stacked bar chart in the end. 到底是分组条形图还是堆叠条形图都没关系。

Please take a look a the enclosed figure for a more visual idea: 请看一下随附的图，以获得更直观的想法：

Answer 1

This should get you pretty close if I understand correctly. 如果我理解正确的话，这应该可以让您接近。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame(columns=['Name', 'Cat_1', 'Cat_2'])

df['Name'] = ['foo', 'bar', 'bla', 'blu', 'boo', 'faa']
df['Cat_1'] = ['P', 'R, M', 'E', 'F', 'G', 'P, E']
df['Cat_2'] = ['Apples, Pears, Cats', 'Apples', 'Pears', 'Cats, Pears', 'Apples, Pears', 'Apples, Cats']

# arrange data simply prepopulate with zero
df_pl = pd.DataFrame(columns=df["Cat_1"].str.split(", ", expand=True).stack().unique().tolist(),
                     index=df["Cat_2"].str.split(", ", expand=True).stack().unique().tolist(),
                     data=0)

# get chunk size for each combination
for x in df_pl.columns:
    ind = df.Cat_1.str.contains(x)
    for name in df_pl.index:
        df_pl.set_value(name, x, df.loc[ind, 'Cat_2'].str.contains(name).sum())

N = len(df_pl.columns)
ind = np.arange(N)    # the x locations for the groups
width = 0.35       # the width of the bars: can also be len(x) sequence

plotted = []
p = {}
for name in df_pl.index:
    bottoms = df_pl.index.isin(plotted).sum()
    p[name] = plt.bar(ind, df_pl.loc[name].values.tolist(), bottom=bottoms)
    plotted.append(name)

plt.ylabel('y_label')
plt.title('some plot')
plt.xticks(ind, df_pl.columns.tolist())
plt.legend(p.values(), p.keys())

plt.show()

从多个类别创建分组/堆叠的条形图，其中包含熊猫数据框中的多个标签

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-04-24 12:09:26

从多个类别创建分组/堆叠的条形图，其中包含熊猫数据框中的多个标签

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-04-24 12:09:26

解决方案1
1 已采纳 2018-04-24 12:09:26