简体   繁体   English

groupby pandas 的 plot groupby

[英]Plot groupby of groupby pandas

The data is a time series, with many member ids associated with many categories:数据是一个时间序列,有许多会员id与许多类别相关联:

data_df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
                            '2018-09-14 00:01:46',
                            '2018-09-14 00:01:56',
                            '2018-09-14 00:01:57',
                            '2018-09-14 00:01:58',
                            '2018-09-14 00:02:05'],
                    'category': [1, 1, 1, 2, 2, 2],
                    'member': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
                    'data': ['23', '20', '20', '11', '16', '62']})

There are about 50 categories with 30 members, each with around 1000 datapoints.大约有 50 个类别和 30 个成员,每个类别有大约 1000 个数据点。

I am trying to make one plot per category.我正在尝试为每个类别绘制一个图。

By subsetting each category then plotting via:通过对每个类别进行子集化然后通过以下方式进行绘图:

fig, ax = plt.subplots(figsize=(8,6))
for i, g in category.groupby(['memeber']):
    g.plot(y='data', ax=ax, label=str(i))

plt.show()

This works fine for a single category, however, when i try to use a for loop to repeat this for each category, it does not work这适用于单个类别,但是,当我尝试使用 for 循环为每个类别重复此操作时,它不起作用

tests = pd.DataFrame()
for category in categories:
    tests = df.loc[df['category'] == category]
    for test in tests:
        fig, ax = plt.subplots(figsize=(8,6))
        for i, g in category.groupby(['member']):
            g.plot(y='data', ax=ax, label=str(i))

            plt.show()

yields an "AttributeError: 'str' object has no attribute 'groupby'" error.产生“AttributeError:‘str’对象没有属性‘groupby’”错误。

What i would like is a loop that spits out one graph per category, with all the members' data plotted on each graph我想要的是一个循环,每个类别吐出一个图表,所有成员的数据都绘制在每个图表上

Far from an expert with pandas, but if you execute the following simple enough snippet远非大熊猫专家,但如果你执行以下足够简单的代码片段

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
                            '2018-09-14 00:01:46',
                            '2018-09-14 00:01:56',
                            '2018-09-14 00:01:57',
                            '2018-09-14 00:01:58',
                            '2018-09-14 00:02:05'],
                   'category': [1, 1, 1, 2, 2, 2],
                   'Id': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
                   'data': ['23', '20', '20', '11', '16', '62']})
fig, ax = plt.subplots()
for item in df.groupby('category'):
    ax.plot([float(x) for x in item[1]['category']],
            [float(x) for x in item[1]['data'].values],
            linestyle='none', marker='D')
plt.show()

you produce this figure你产生这个数字在此处输入图像描述

But there is probably a better way.但可能有更好的方法。

EDIT: Based on the changes made to your question, I changed my snippet to编辑:根据对您的问题所做的更改,我将代码片段更改为

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
                            '2018-09-14 00:01:46',
                            '2018-09-14 00:01:56',
                            '2018-09-14 00:01:57',
                            '2018-09-14 00:01:58',
                            '2018-09-14 00:02:05'],
                   'category': [1, 1, 1, 2, 2, 2],
                   'Id': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
                   'data': ['23', '20', '20', '11', '16', '62']})
fig, ax = plt.subplots(nrows=np.unique(df['category']).size)
for i, item in enumerate(df.groupby('category')):
    ax[i].plot([str(x) for x in item[1]['Id']],
               [float(x) for x in item[1]['data'].values],
               linestyle='none', marker='D')
    ax[i].set_title('Category {}'.format(item[1]['category'].values[0]))
fig.tight_layout()
plt.show()

which now displays现在显示

在此处输入图像描述

Creating your dataframe创建数据框

import pandas as pd

data_df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
                            '2018-09-14 00:01:46',
                            '2018-09-14 00:01:56',
                            '2018-09-14 00:01:57',
                            '2018-09-14 00:01:58',
                            '2018-09-14 00:02:05'],
                    'category': [1, 1, 1, 2, 2, 2],
                    'member': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
                    'data': ['23', '20', '20', '11', '16', '62']})

then [EDIT after comments]然后[评论后编辑]

import matplotlib.pyplot as plt
import numpy as np

subplots_n = np.unique(data_df['category']).size
subplots_x = np.round(np.sqrt(subplots_n)).astype(int)
subplots_y = np.ceil(np.sqrt(subplots_n)).astype(int)

for i, category in enumerate(data_df.groupby('category')):
    category_df = pd.DataFrame(category[1])
    x = [str(x) for x in category_df['member']]
    y = [float(x) for x in category_df['data']]
    plt.subplot(subplots_x, subplots_y, i+1)
    plt.plot(x, y)
    plt.title("Category {}".format(category_df['category'].values[0]))

plt.tight_layout()
plt.show()

yields to屈服于

熊猫 groupby 子图

Please note that this nicely takes care also of bigger groups like请注意,这也很好地照顾了更大的群体,比如

data_df2 = pd.DataFrame({'category': [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 5],
                    'member': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe', 'ric', 'mat', 'pip', 'zoe', 'qui', 'quo', 'qua'],
                    'data': ['23', '20', '20', '11', '16', '62', '34', '27', '12', '7', '9', '13', '7']})

熊猫 groupby 子图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM