简体   繁体   English

Pandas df groupby dates period multiple

[英]Pandas df groupby dates period multiple

I have a dataframe where I have columns date , createdAt , amount , date and created are dates and amount is float.我有一个 dataframe,其中有列datecreatedAtamount 、 date 和 created 是日期,amount 是浮点数。 Example:例子:

1    2020-01, 2020-01,  35.50
2    2020-02, 2020-01,  35.50
3    2020-03, 2020-01,  35.50
4    2020-04, 2020-01,  35.50
5    2020-05, 2020-01,  35.50
6    2020-01, 2020-01,  35.50
7    2020-02, 2020-01,  35.50
8    2020-03, 2020-01,  35.50
9    2020-04, 2020-01,  35.50
10    2020-05, 2020-01,  35.50
11    2020-01, 2020-01,  35.50
12    2020-02, 2020-01,  35.50
.
.

My expected result is to group it so I would get something like:我的预期结果是将其分组,这样我会得到类似的结果:

1    2020-01, 2020-01,  426
2    2020-02, 2020-01,  426
3    2020-03, 2020-01,  426
4    2020-04, 2020-01,  426
5    2020-05, 2020-01,  426
6    2020-01, 2020-02,  426
7    2020-02, 2020-02,  426
8    2020-03, 2020-02,  426
9    2020-04, 2020-02,  426
10    2020-05, 2020-02,  426
11   2020-01, 2020-03,  426
12    2020-02, 2020-03,  426
13    2020-03, 2020-03,  426
14    2020-04, 2020-03,  426
15    2020-05, 2020-03,  426
.
.
.
 and other more data with more variation to amount, but two dates would always meet with each other at some point.

Basically my solution was to groupby date and createdAt and aggregate amount with sum.基本上我的解决方案是按date和创建时间createdAt ,并用总和来汇总amount

So something like:所以像:

firststep = df.groupby(['date', "createdAt", ])
second_df = firststep.agg({'amount': 'sum'})
reset_df = second_df.reset_index()

But what I get is something like:但我得到的是这样的:

1    2020-01, 2020-01,  177.5
2    2020-01, 2020-02,  177.5
3    2020-01, 2020-03,  177.5
4    2020-01, 2020-04,  177.5
5    2020-01, 2020-05,  177.5
6    2020-02, 2020-02,  142
7    2020-02 2020-03,  142
8    2020-02, 2020-04,  142
9    2020-02, 2020-05,  142
10    2020-03, 2020-03,  106.5
11    2020-03, 2020-04,  106.5
12    2020-03, 2020-05,  106.5
.
.

My values was supposed to meet up with each other at some point but some groupings are missing, and its starting after the previous date.我的价值观应该在某个时候相互满足,但缺少一些分组,并且它是在前一个日期之后开始的。 Like after 2020-01, 2020-05 ,the next row is 2020-02, 2020-02 and not 2020-02, 2020-01像在2020-01, 2020-05之后,下一行是2020-02, 2020-02而不是2020-02, 2020-01

Im figuring out how to group it by the two columns and not losing some groupings.我正在弄清楚如何按两列对其进行分组而不丢失一些分组。 How do I get my desired output in a dataframe?如何在 dataframe 中获得我想要的 output?

If I understand you correctly and based upon your input and your desired output you should not group the dataframe based on both date and createdAt columns, rather you should group it just based on the createdAt :如果我对你的理解是正确的,并且根据你的输入和你想要的 output,你不应该根据datecreatedAt列对 dataframe 进行分组,而应该只根据createdAt对其进行分组:

df.groupby("createdAt").sum()

Output Output

createdAt创建于 amount数量
2020-01 2020-01 426 426

Try:尝试:

df["amount"] = df.groupby("createdAt")["amount"].transform("sum")

>>> df
       date createdAt  amount
0   2020-01   2020-01   426.0
1   2020-02   2020-01   426.0
2   2020-03   2020-01   426.0
3   2020-04   2020-01   426.0
4   2020-05   2020-01   426.0
5   2020-01   2020-01   426.0
6   2020-02   2020-01   426.0
7   2020-03   2020-01   426.0
8   2020-04   2020-01   426.0
9   2020-05   2020-01   426.0
10  2020-01   2020-01   426.0
11  2020-02   2020-01   426.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM