简体   繁体   English

在大熊猫中绘制groupby操作的结果

[英]Plot the result of a groupby operation in pandas

I have this sample table: 我有此示例表:

    ID   Date      Days Volume/Day
0   111 2016-01-01  20  50
1   111 2016-02-01  25  40
2   111 2016-03-01  31  35
3   111 2016-04-01  30  30
4   111 2016-05-01  31  25
5   111 2016-06-01  30  20
6   111 2016-07-01  31  20
7   111 2016-08-01  31  15
8   111 2016-09-01  29  15
9   111 2016-10-01  31  10
10  111 2016-11-01  29  5
11  111 2016-12-01  27  0
0   112 2016-01-01  31  55
1   112 2016-02-01  26  45
2   112 2016-03-01  31  40
3   112 2016-04-01  30  35
4   112 2016-04-01  31  30
5   112 2016-05-01  30  25
6   112 2016-06-01  31  25
7   112 2016-07-01  31  20
8   112 2016-08-01  30  20
9   112 2016-09-01  31  15
10  112 2016-11-01  29  10
11  112 2016-12-01  31  0

I'm trying to make my table final table look like this below after grouping by ID and Date. 在按ID和日期分组后,我试图使我的表最终表如下所示。

    ID   Date      CumDays  Volume/Day
0   111 2016-01-01  20       50
1   111 2016-02-01  45       40
2   111 2016-03-01  76       35
3   111 2016-04-01  106      30
4   111 2016-05-01  137      25
5   111 2016-06-01  167      20
6   111 2016-07-01  198      20
7   111 2016-08-01  229      15
8   111 2016-09-01  258      15
9   111 2016-10-01  289      10
10  111 2016-11-01  318      5
11  111 2016-12-01  345      0
0   112 2016-01-01  31       55
1   112 2016-02-01  57       45
2   112 2016-03-01  88       40
3   112 2016-04-01  118      35
4   112 2016-05-01  149      30
5   112 2016-06-01  179      25
6   112 2016-07-01  210      25
7   112 2016-08-01  241      20
8   112 2016-09-01  271      20
9   112 2016-10-01  302      15
10  112 2016-11-01  331      10
11  112 2016-12-01  362       0

Next, I want to be able to extract the first value of Volume/Day per ID, all the CumDays values and all the Volume/Day values per ID and Date. 接下来,我希望能够提取每个ID的“体积/天”的第一个值,所有CumDays值以及每个ID和“日期”的所有“体积/天”的值。 So I can use them for further computation and plotting Volume/Day vs CumDays. 因此,我可以使用它们进行进一步的计算,并绘制“体积/天”与“累积日”的关系。 Example for ID:111, the first value of Volume/Day will be only 50 and ID:112, it will be only 55. All CumDays values for ID:111 will be 20,45... and ID:112, it will be 31,57...For all Volume/Day --- ID:111, will be 50, 40... and ID:112 will be 55,45... 以ID:111为例,“交易量/天”的第一个值将仅为50,而ID:112的第一个值将仅为55。ID:111的所有CumDays值将分别为20,45 ...和ID:112,将是31,57 ...对于所有交易量/天-ID:111将是50、40 ...而ID:112将是55,45 ...

My solution: 我的解决方案:

def get_time_rate(grp_df):
    t = grp_df['Days'].cumsum()
    r = grp_df['Volume/Day']
    return t,r

vals = df.groupby(['ID','Date']).apply(get_time_rate)
vals

Doing this, the cumulative calculation doesn't take effect at all. 这样做,累积计算根本不起作用。 It returns the original Days value. 它返回原始的天数值。 This didn't allow me move further in extracting the first value of Volume/Day, all the CumDays values and all the Volume/Day values I need. 这不允许我进一步提取第一个“体积/天”值,所有CumDays值和所有我需要的“体积/天”值。 Any advice or help on how to go about it will be appreciated. 任何建议或帮助如何进行将不胜感激。 Thanks 谢谢

Get a groupby object. 获取一个groupby对象。

g = df.groupby('ID')

Compute columns with transform : transform计算列:

df['CumDays'] = g.Days.transform('cumsum')
df['First Volume/Day'] = g['Volume/Day'].transform('first')
df

     ID        Date  Days  Volume/Day  CumDays  First Volume/Day
0   111  2016-01-01    20          50       20                50
1   111  2016-02-01    25          40       45                50
2   111  2016-03-01    31          35       76                50
3   111  2016-04-01    30          30      106                50
4   111  2016-05-01    31          25      137                50
5   111  2016-06-01    30          20      167                50
6   111  2016-07-01    31          20      198                50
7   111  2016-08-01    31          15      229                50
8   111  2016-09-01    29          15      258                50
9   111  2016-10-01    31          10      289                50
10  111  2016-11-01    29           5      318                50
11  111  2016-12-01    27           0      345                50
0   112  2016-01-01    31          55       31                55
1   112  2016-01-02    26          45       57                55
2   112  2016-01-03    31          40       88                55
3   112  2016-01-04    30          35      118                55
4   112  2016-01-05    31          30      149                55
5   112  2016-01-06    30          25      179                55
6   112  2016-01-07    31          25      210                55
7   112  2016-01-08    31          20      241                55
8   112  2016-01-09    30          20      271                55
9   112  2016-01-10    31          15      302                55
10  112  2016-01-11    29          10      331                55
11  112  2016-01-12    31           0      362                55

If you want grouped plots, you can iterate over each groups after grouping by ID . 如果要对图进行分组,可以在按ID分组后遍历每个组。 To plot, first set index and call plot . 要进行绘图,首先设置索引并调用plot

fig, ax = plt.subplots(figsize=(8,6))
for i, g in df2.groupby('ID'):
    g.plot(x='CumDays', y='Volume/Day', ax=ax, label=str(i))

plt.show()

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM