[英]Need to count the total sum of operations from 15th day to 15th day of next month
I need to sum values from floating dates for every period since 15th day of month, for ex.我需要对自每月第 15 天以来的每个时期的浮动日期的值求和,例如。 15.10-14.11, 15.11-14.12 etc. grouped for each pair of id
- dp_id
. 15.10-14.11、15.11-14.12 等为每对id
- dp_id
。
My df
:我的df
:
date id dp value
2020-11-13 300000 002 500,00
2020-11-14 352575 001 400,00
2020-11-15 352575 001 100,00
2020-11-16 352575 001 500,00
...............................
`days from 17.11-12.14`
...............................
2020-12-15 300000 002 700,00
2020-12-16 352575 001 200,00
2020-12-17 352575 001 500,00
2020-12-18 352575 002 600,00
Expected output table , but not strict variant: it doesn't matter how to mark this periods in output for ex 2020-11-01 could mean 15.10-14.11预期的 output 表,但不是严格的变体:如何在 output 中为前 2020-11-01 标记此期间可能意味着 15.10-14.11
period id dp value
2020-11-01 300000 002 500,00
2020-11-01 352575 001 400,00
2021-11-01 352575 002 1000,00
2020-12-01 300000 002 700,00
2020-12-01 352575 001 700,00
2020-12-01 352575 002 600,00
...............................
I've tried to solve the problem through grouper method, but it doesnt's work for me我试图通过石斑鱼方法解决问题,但这对我不起作用
def grouper(x):
d = x.rename('date').to_frame().reset_index()
return d.groupby(pd.Grouper(key='date', freq='M', origin='start')).cumsum()
df['sum'] = df.groupby(['id', 'dp'])['date'].transform(grouper)
IIUC, here is one alternative: IIUC,这是一种选择:
df['custom_period'] = (df.index.day == 15 - 1).cumsum() # Begin on every 15th
df['value'] = df.groupby(['custom_period', 'id', 'dp'])['value'].transform('cumsum')
df.drop('custom_period', axis=1, inplace=True)
Output: Output:
id dp value
date
2020-11-13 300000 2 500
2020-11-14 352575 1 400
2020-11-15 352575 1 500
2020-11-16 352575 1 1000
2020-12-15 300000 2 700
2020-12-16 352575 1 1200
2020-12-17 352575 1 1700
2020-12-18 352575 2 600
First, you need to create a column that has the year and month.首先,您需要创建一个包含年份和月份的列。 The year and month has to be based on 15th of month to 14th of month.年份和月份必须基于每月 15 日至 14 日。 Any value from 15th to 31st should fall under the next month.从 15 日到 31 日的任何值都应低于下个月。 To do that, use pd.DateOffset(months=1)
to increment the value to next month.为此,请使用pd.DateOffset(months=1)
将值增加到下个月。 You can use df.date.dt.day
to get the day part of the date.您可以使用df.date.dt.day
获取日期的日期部分。 Check if it is greater than 14
.检查它是否大于14
。 If so, move the month by 1.如果是这样,将月份移动 1。
Now that you have the year-month in another column, use that to groupby.现在您在另一列中有年月,将其用于 groupby。 If you want each row to have the value, use groupby.transform().如果您希望每一行都有值,请使用 groupby.transform()。 If you want the summary only, then use.sum().如果您只想要摘要,则使用.sum()。
Here's the code to get you the sum for each row.这是获取每行总和的代码。
c = ['date','id','dp','value']
d = [['2020-11-13', 300000, '002', 500.00],
['2020-11-14', 352575, '001', 400.00],
['2020-11-15', 352575, '001', 100.00],
['2020-11-16', 352575, '001', 500.00],
['2020-12-15', 300000, '002', 700.00],
['2020-12-16', 352575, '001', 200.00],
['2020-12-17', 352575, '001', 500.00],
['2020-12-18', 352575, '002', 600.00]]
import pandas as pd
df = pd.DataFrame(d,columns=c)
df['date'] = pd.to_datetime(df['date'])
df['Year-Mon'] = df.date.dt.strftime('%Y-%m')
#
df.loc[df.date.dt.day > 14, 'Year-Mon'] = (df.date + pd.DateOffset(months=1)).dt.strftime('%Y-%m')
df['sum'] = df.groupby(['id', 'dp', 'Year-Mon'])['value'].transform('sum')
print (df)
The output of this will be: output 将是:
date id dp value Year-Mon sum
0 2020-11-13 300000 002 500.0 2020-11 500.0
1 2020-11-14 352575 001 400.0 2020-11 400.0
2 2020-11-15 352575 001 100.0 2020-12 600.0
3 2020-11-16 352575 001 500.0 2020-12 600.0
4 2020-12-15 300000 002 700.0 2021-01 700.0
5 2020-12-16 352575 001 200.0 2021-01 700.0
6 2020-12-17 352575 001 500.0 2021-01 700.0
7 2020-12-18 352575 002 600.0 2021-01 600.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.