[英]Pandas df groupby dates period multiple
I have a dataframe where I have columns date
, createdAt
, amount
, date and created are dates and amount is float.我有一个 dataframe,其中有列date
、 createdAt
、 amount
、 date 和 created 是日期,amount 是浮点数。 Example:例子:
1 2020-01, 2020-01, 35.50
2 2020-02, 2020-01, 35.50
3 2020-03, 2020-01, 35.50
4 2020-04, 2020-01, 35.50
5 2020-05, 2020-01, 35.50
6 2020-01, 2020-01, 35.50
7 2020-02, 2020-01, 35.50
8 2020-03, 2020-01, 35.50
9 2020-04, 2020-01, 35.50
10 2020-05, 2020-01, 35.50
11 2020-01, 2020-01, 35.50
12 2020-02, 2020-01, 35.50
.
.
My expected result is to group it so I would get something like:我的预期结果是将其分组,这样我会得到类似的结果:
1 2020-01, 2020-01, 426
2 2020-02, 2020-01, 426
3 2020-03, 2020-01, 426
4 2020-04, 2020-01, 426
5 2020-05, 2020-01, 426
6 2020-01, 2020-02, 426
7 2020-02, 2020-02, 426
8 2020-03, 2020-02, 426
9 2020-04, 2020-02, 426
10 2020-05, 2020-02, 426
11 2020-01, 2020-03, 426
12 2020-02, 2020-03, 426
13 2020-03, 2020-03, 426
14 2020-04, 2020-03, 426
15 2020-05, 2020-03, 426
.
.
.
and other more data with more variation to amount, but two dates would always meet with each other at some point.
Basically my solution was to groupby date
and createdAt
and aggregate amount
with sum.基本上我的解决方案是按date
和创建时间createdAt
,并用总和来汇总amount
。
So something like:所以像:
firststep = df.groupby(['date', "createdAt", ])
second_df = firststep.agg({'amount': 'sum'})
reset_df = second_df.reset_index()
But what I get is something like:但我得到的是这样的:
1 2020-01, 2020-01, 177.5
2 2020-01, 2020-02, 177.5
3 2020-01, 2020-03, 177.5
4 2020-01, 2020-04, 177.5
5 2020-01, 2020-05, 177.5
6 2020-02, 2020-02, 142
7 2020-02 2020-03, 142
8 2020-02, 2020-04, 142
9 2020-02, 2020-05, 142
10 2020-03, 2020-03, 106.5
11 2020-03, 2020-04, 106.5
12 2020-03, 2020-05, 106.5
.
.
My values was supposed to meet up with each other at some point but some groupings are missing, and its starting after the previous date.我的价值观应该在某个时候相互满足,但缺少一些分组,并且它是在前一个日期之后开始的。 Like after 2020-01, 2020-05
,the next row is 2020-02, 2020-02
and not 2020-02, 2020-01
像在2020-01, 2020-05
之后,下一行是2020-02, 2020-02
而不是2020-02, 2020-01
Im figuring out how to group it by the two columns and not losing some groupings.我正在弄清楚如何按两列对其进行分组而不丢失一些分组。 How do I get my desired output in a dataframe?如何在 dataframe 中获得我想要的 output?
If I understand you correctly and based upon your input and your desired output you should not group the dataframe based on both date
and createdAt
columns, rather you should group it just based on the createdAt
:如果我对你的理解是正确的,并且根据你的输入和你想要的 output,你不应该根据date
和createdAt
列对 dataframe 进行分组,而应该只根据createdAt
对其进行分组:
df.groupby("createdAt").sum()
createdAt创建于 | amount数量 |
---|---|
2020-01 2020-01 | 426 426 |
Try:尝试:
df["amount"] = df.groupby("createdAt")["amount"].transform("sum")
>>> df
date createdAt amount
0 2020-01 2020-01 426.0
1 2020-02 2020-01 426.0
2 2020-03 2020-01 426.0
3 2020-04 2020-01 426.0
4 2020-05 2020-01 426.0
5 2020-01 2020-01 426.0
6 2020-02 2020-01 426.0
7 2020-03 2020-01 426.0
8 2020-04 2020-01 426.0
9 2020-05 2020-01 426.0
10 2020-01 2020-01 426.0
11 2020-02 2020-01 426.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.