groupby pandas 的 plot groupby

Question

The data is a time series, with many member ids associated with many categories:数据是一个时间序列，有许多会员id与许多类别相关联：

data_df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
                            '2018-09-14 00:01:46',
                            '2018-09-14 00:01:56',
                            '2018-09-14 00:01:57',
                            '2018-09-14 00:01:58',
                            '2018-09-14 00:02:05'],
                    'category': [1, 1, 1, 2, 2, 2],
                    'member': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
                    'data': ['23', '20', '20', '11', '16', '62']})

There are about 50 categories with 30 members, each with around 1000 datapoints.大约有 50 个类别和 30 个成员，每个类别有大约 1000 个数据点。

I am trying to make one plot per category.我正在尝试为每个类别绘制一个图。

By subsetting each category then plotting via:通过对每个类别进行子集化然后通过以下方式进行绘图：

fig, ax = plt.subplots(figsize=(8,6))
for i, g in category.groupby(['memeber']):
    g.plot(y='data', ax=ax, label=str(i))

plt.show()

This works fine for a single category, however, when i try to use a for loop to repeat this for each category, it does not work这适用于单个类别，但是，当我尝试使用 for 循环为每个类别重复此操作时，它不起作用

tests = pd.DataFrame()
for category in categories:
    tests = df.loc[df['category'] == category]
    for test in tests:
        fig, ax = plt.subplots(figsize=(8,6))
        for i, g in category.groupby(['member']):
            g.plot(y='data', ax=ax, label=str(i))

            plt.show()

yields an "AttributeError: 'str' object has no attribute 'groupby'" error.产生“AttributeError：‘str’对象没有属性‘groupby’”错误。

What i would like is a loop that spits out one graph per category, with all the members' data plotted on each graph我想要的是一个循环，每个类别吐出一个图表，所有成员的数据都绘制在每个图表上

Answer 1

Far from an expert with pandas, but if you execute the following simple enough snippet远非大熊猫专家，但如果你执行以下足够简单的代码片段

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
                            '2018-09-14 00:01:46',
                            '2018-09-14 00:01:56',
                            '2018-09-14 00:01:57',
                            '2018-09-14 00:01:58',
                            '2018-09-14 00:02:05'],
                   'category': [1, 1, 1, 2, 2, 2],
                   'Id': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
                   'data': ['23', '20', '20', '11', '16', '62']})
fig, ax = plt.subplots()
for item in df.groupby('category'):
    ax.plot([float(x) for x in item[1]['category']],
            [float(x) for x in item[1]['data'].values],
            linestyle='none', marker='D')
plt.show()

you produce this figure你产生这个数字

But there is probably a better way.但可能有更好的方法。

EDIT: Based on the changes made to your question, I changed my snippet to编辑：根据对您的问题所做的更改，我将代码片段更改为

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
                            '2018-09-14 00:01:46',
                            '2018-09-14 00:01:56',
                            '2018-09-14 00:01:57',
                            '2018-09-14 00:01:58',
                            '2018-09-14 00:02:05'],
                   'category': [1, 1, 1, 2, 2, 2],
                   'Id': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
                   'data': ['23', '20', '20', '11', '16', '62']})
fig, ax = plt.subplots(nrows=np.unique(df['category']).size)
for i, item in enumerate(df.groupby('category')):
    ax[i].plot([str(x) for x in item[1]['Id']],
               [float(x) for x in item[1]['data'].values],
               linestyle='none', marker='D')
    ax[i].set_title('Category {}'.format(item[1]['category'].values[0]))
fig.tight_layout()
plt.show()

which now displays现在显示

Answer 2

Creating your dataframe创建数据框

import pandas as pd

data_df = pd.DataFrame({'Date': ['2018-09-14 00:00:22',
                            '2018-09-14 00:01:46',
                            '2018-09-14 00:01:56',
                            '2018-09-14 00:01:57',
                            '2018-09-14 00:01:58',
                            '2018-09-14 00:02:05'],
                    'category': [1, 1, 1, 2, 2, 2],
                    'member': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe'],
                    'data': ['23', '20', '20', '11', '16', '62']})

then [EDIT after comments]然后[评论后编辑]

import matplotlib.pyplot as plt
import numpy as np

subplots_n = np.unique(data_df['category']).size
subplots_x = np.round(np.sqrt(subplots_n)).astype(int)
subplots_y = np.ceil(np.sqrt(subplots_n)).astype(int)

for i, category in enumerate(data_df.groupby('category')):
    category_df = pd.DataFrame(category[1])
    x = [str(x) for x in category_df['member']]
    y = [float(x) for x in category_df['data']]
    plt.subplot(subplots_x, subplots_y, i+1)
    plt.plot(x, y)
    plt.title("Category {}".format(category_df['category'].values[0]))

plt.tight_layout()
plt.show()

yields to屈服于

Please note that this nicely takes care also of bigger groups like请注意，这也很好地照顾了更大的群体，比如

data_df2 = pd.DataFrame({'category': [1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 5, 5, 5],
                    'member': ['bob', 'joe', 'jim', 'sally', 'jane', 'doe', 'ric', 'mat', 'pip', 'zoe', 'qui', 'quo', 'qua'],
                    'data': ['23', '20', '20', '11', '16', '62', '34', '27', '12', '7', '9', '13', '7']})

groupby pandas 的 plot groupby

问题描述

2 个解决方案

解决方案1
1 2019-11-29 22:56:07

解决方案2
1 已采纳 2019-12-01 13:47:54

groupby pandas 的 plot groupby

问题描述

2 个解决方案

解决方案1 1 2019-11-29 22:56:07

解决方案2 1 已采纳 2019-12-01 13:47:54

解决方案1
1 2019-11-29 22:56:07

解决方案2
1 已采纳 2019-12-01 13:47:54