简体   繁体   中英

Need to count the total sum of operations from 15th day to 15th day of next month

I need to sum values from floating dates for every period since 15th day of month, for ex. 15.10-14.11, 15.11-14.12 etc. grouped for each pair of id - dp_id .

My df :

  date        id      dp   value
  2020-11-13  300000  002  500,00  
  2020-11-14  352575  001  400,00
  2020-11-15  352575  001  100,00
  2020-11-16  352575  001  500,00
  ...............................
      `days from 17.11-12.14`
  ...............................
  2020-12-15  300000  002  700,00
  2020-12-16  352575  001  200,00
  2020-12-17  352575  001  500,00
  2020-12-18  352575  002  600,00

Expected output table , but not strict variant: it doesn't matter how to mark this periods in output for ex 2020-11-01 could mean 15.10-14.11

  period      id      dp   value
  2020-11-01  300000  002  500,00  
  2020-11-01  352575  001  400,00
  2021-11-01  352575  002  1000,00
  2020-12-01  300000  002  700,00
  2020-12-01  352575  001  700,00
  2020-12-01  352575  002  600,00
  ...............................

I've tried to solve the problem through grouper method, but it doesnt's work for me

def grouper(x):
   d = x.rename('date').to_frame().reset_index()
   return d.groupby(pd.Grouper(key='date', freq='M', origin='start')).cumsum()

df['sum'] = df.groupby(['id', 'dp'])['date'].transform(grouper)

IIUC, here is one alternative:

df['custom_period'] = (df.index.day == 15 - 1).cumsum()  # Begin on every 15th
df['value'] = df.groupby(['custom_period', 'id', 'dp'])['value'].transform('cumsum')
df.drop('custom_period', axis=1, inplace=True)

Output:

                id  dp  value
date                         
2020-11-13  300000   2    500
2020-11-14  352575   1    400
2020-11-15  352575   1    500
2020-11-16  352575   1   1000
2020-12-15  300000   2    700
2020-12-16  352575   1   1200
2020-12-17  352575   1   1700
2020-12-18  352575   2    600

First, you need to create a column that has the year and month. The year and month has to be based on 15th of month to 14th of month. Any value from 15th to 31st should fall under the next month. To do that, use pd.DateOffset(months=1) to increment the value to next month. You can use df.date.dt.day to get the day part of the date. Check if it is greater than 14 . If so, move the month by 1.

Now that you have the year-month in another column, use that to groupby. If you want each row to have the value, use groupby.transform(). If you want the summary only, then use.sum().

Here's the code to get you the sum for each row.

c = ['date','id','dp','value']

d = [['2020-11-13',  300000,  '002',  500.00],  
  ['2020-11-14',  352575,  '001',  400.00],
  ['2020-11-15',  352575,  '001',  100.00],
  ['2020-11-16',  352575,  '001',  500.00],
  ['2020-12-15',  300000,  '002',  700.00],
  ['2020-12-16',  352575,  '001',  200.00],
  ['2020-12-17',  352575,  '001',  500.00],
  ['2020-12-18',  352575,  '002',  600.00]]

import pandas as pd
df = pd.DataFrame(d,columns=c)

df['date'] = pd.to_datetime(df['date'])

df['Year-Mon'] = df.date.dt.strftime('%Y-%m')

#
df.loc[df.date.dt.day > 14, 'Year-Mon'] = (df.date + pd.DateOffset(months=1)).dt.strftime('%Y-%m')

df['sum'] = df.groupby(['id', 'dp', 'Year-Mon'])['value'].transform('sum')
print (df)

The output of this will be:

        date      id   dp  value Year-Mon    sum
0 2020-11-13  300000  002  500.0  2020-11  500.0
1 2020-11-14  352575  001  400.0  2020-11  400.0
2 2020-11-15  352575  001  100.0  2020-12  600.0
3 2020-11-16  352575  001  500.0  2020-12  600.0
4 2020-12-15  300000  002  700.0  2021-01  700.0
5 2020-12-16  352575  001  200.0  2021-01  700.0
6 2020-12-17  352575  001  500.0  2021-01  700.0
7 2020-12-18  352575  002  600.0  2021-01  600.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM