I need to sum values from floating dates for every period since 15th day of month, for ex. 15.10-14.11, 15.11-14.12 etc. grouped for each pair of id
- dp_id
.
My df
:
date id dp value
2020-11-13 300000 002 500,00
2020-11-14 352575 001 400,00
2020-11-15 352575 001 100,00
2020-11-16 352575 001 500,00
...............................
`days from 17.11-12.14`
...............................
2020-12-15 300000 002 700,00
2020-12-16 352575 001 200,00
2020-12-17 352575 001 500,00
2020-12-18 352575 002 600,00
Expected output table , but not strict variant: it doesn't matter how to mark this periods in output for ex 2020-11-01 could mean 15.10-14.11
period id dp value
2020-11-01 300000 002 500,00
2020-11-01 352575 001 400,00
2021-11-01 352575 002 1000,00
2020-12-01 300000 002 700,00
2020-12-01 352575 001 700,00
2020-12-01 352575 002 600,00
...............................
I've tried to solve the problem through grouper method, but it doesnt's work for me
def grouper(x):
d = x.rename('date').to_frame().reset_index()
return d.groupby(pd.Grouper(key='date', freq='M', origin='start')).cumsum()
df['sum'] = df.groupby(['id', 'dp'])['date'].transform(grouper)
IIUC, here is one alternative:
df['custom_period'] = (df.index.day == 15 - 1).cumsum() # Begin on every 15th
df['value'] = df.groupby(['custom_period', 'id', 'dp'])['value'].transform('cumsum')
df.drop('custom_period', axis=1, inplace=True)
Output:
id dp value
date
2020-11-13 300000 2 500
2020-11-14 352575 1 400
2020-11-15 352575 1 500
2020-11-16 352575 1 1000
2020-12-15 300000 2 700
2020-12-16 352575 1 1200
2020-12-17 352575 1 1700
2020-12-18 352575 2 600
First, you need to create a column that has the year and month. The year and month has to be based on 15th of month to 14th of month. Any value from 15th to 31st should fall under the next month. To do that, use pd.DateOffset(months=1)
to increment the value to next month. You can use df.date.dt.day
to get the day part of the date. Check if it is greater than 14
. If so, move the month by 1.
Now that you have the year-month in another column, use that to groupby. If you want each row to have the value, use groupby.transform(). If you want the summary only, then use.sum().
Here's the code to get you the sum for each row.
c = ['date','id','dp','value']
d = [['2020-11-13', 300000, '002', 500.00],
['2020-11-14', 352575, '001', 400.00],
['2020-11-15', 352575, '001', 100.00],
['2020-11-16', 352575, '001', 500.00],
['2020-12-15', 300000, '002', 700.00],
['2020-12-16', 352575, '001', 200.00],
['2020-12-17', 352575, '001', 500.00],
['2020-12-18', 352575, '002', 600.00]]
import pandas as pd
df = pd.DataFrame(d,columns=c)
df['date'] = pd.to_datetime(df['date'])
df['Year-Mon'] = df.date.dt.strftime('%Y-%m')
#
df.loc[df.date.dt.day > 14, 'Year-Mon'] = (df.date + pd.DateOffset(months=1)).dt.strftime('%Y-%m')
df['sum'] = df.groupby(['id', 'dp', 'Year-Mon'])['value'].transform('sum')
print (df)
The output of this will be:
date id dp value Year-Mon sum
0 2020-11-13 300000 002 500.0 2020-11 500.0
1 2020-11-14 352575 001 400.0 2020-11 400.0
2 2020-11-15 352575 001 100.0 2020-12 600.0
3 2020-11-16 352575 001 500.0 2020-12 600.0
4 2020-12-15 300000 002 700.0 2021-01 700.0
5 2020-12-16 352575 001 200.0 2021-01 700.0
6 2020-12-17 352575 001 500.0 2021-01 700.0
7 2020-12-18 352575 002 600.0 2021-01 600.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.