简体   繁体   English

如何使用pandas / matplot lib绘制/管理2列分类数据?

[英]How to plot/manage 2 column categorical data using pandas/matplot lib?

I have a dataset representing a bunch of posts. 我有一个代表一堆帖子的数据集。 Each post can have any of 4 categories and 6 results. 每个帖子可以有4个类别和6个结果中的任何一个。

What I want to do is see how many results are of all the 6 types for each of the 4 categories. 我想要做的是查看4个类别中每个类别的所有6种类型的结果。

I used 我用了

df = df.groupby(["Category", "Result"]).size().reset_index(name='Count')

To get a 3 column dataframe w/ the necessary counts. 获得具有必要计数的3列数据帧。 What I want to do is plot a multiple bar graph for all the categories, such that the xticks are the categories, and each category has 6 bars for all the results. 我想要做的是为所有类别绘制一个多条形图,这样xticks就是类别,每个类别都有6个条形图用于所有结果。

How can I achieve this? 我怎样才能做到这一点?

It could be a good idea to create a pivot table from the dataframe. 从数据框创建数据透视表可能是个好主意。 The resulting table can easily be plotted using the built-in plot functionality. 使用内置绘图功能可以轻松绘制生成的表格。

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

cats = np.array([l for l in "ABCD"], dtype=str)
cats = np.random.choice(cats, 100, p=[0.3, 0.1, 0.4, 0.2])

res = np.random.choice(np.arange(1,7), 100, p=[0.2, 0.1, 0.08, 0.16,0.26,0.2])
df = pd.DataFrame({"Category":cats, "Result":res})
df2 = df.groupby(["Category", "Result"]).size().reset_index(name='Count')


df3 = pd.pivot_table(df2,  values='Count',  columns=['Result'],  index = "Category",
                         aggfunc=np.sum,  fill_value=0)
df4 = pd.pivot_table(df2,  values='Count',  columns=['Category'],  index = "Result",
                         aggfunc=np.sum,  fill_value=0)

fig, ax = plt.subplots(1,2, figsize=(10,4))
df3.plot(kind="bar", ax=ax[0])
df4.plot(kind="bar", ax=ax[1]) 

plt.show()

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM