简体   繁体   中英

Pandas Groupby/Grouper group by starting index value

How do adjust the starting time in Grouper?

Starting with this sample DF:

import datetime as DT
df = pd.DataFrame({
'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
DT.datetime(2013,1,1,13,0),
DT.datetime(2013,3,1,13,5),
DT.datetime(2013,5,1,20,0),
DT.datetime(2013,8,2,10,0),
DT.datetime(2013,9,2,12,0),                                      
DT.datetime(2013,11,2,14,0),
]})
df = df.set_index('Date')

df.groupby(pd.Grouper(freq='1MS'))["Quantity"].count()

   Date
2013-01-01    1
2013-02-01    0
2013-03-01    1
2013-04-01    0
2013-05-01    1
2013-06-01    0
2013-07-01    0
2013-08-01    1
2013-09-01    1
2013-10-01    0
2013-11-01    1

df.groupby(pd.Grouper(freq='2MS'))["Quantity"].count()

   Date
2013-01-01    1
2013-03-01    1
2013-05-01    1
2013-07-01    1
2013-09-01    1
2013-11-01    1

What I was looking for is "2MS" from index date using Grouper or TimeGrouper . The above is returning "2MS" from first value in the index or 1/1/2013. How do I get 2MS from '8/1/2013' for 2.

Targeting:

     Date
2013-01-01    1
2013-03-01    1
2013-05-01    1
2013-08-01    2
2013-09-01    1
2013-11-01    1

Notes:

What I'm trying to do groupby's based on index values.. -- 1st groupby would start slice from 1/1. The 2nd slice would start from 3/1, the 3rd from 5/1. The end period would be 2MS. Now using Grouper, it starts the slicing from the first date and continues in two month intervals. The fourth interval should start on 8/1 end 10/2. Right now, 8/2 starts on 7/1.

You want a forward rolling window while pandas makes backwards rolling windows. So the idea is to reversed the ordering of your series, take a rolling window and then revert the ordering.

This is what you already had:

from datetime import datetime

import pandas as pd

df = pd.DataFrame({'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(),
                   'Quantity': [1, 3, 5, 8, 9, 3],
                   'Date' : [datetime(2013, 1, 1, 13, 0),
                             datetime(2013, 3, 1, 13, 5),
                             datetime(2013, 5, 1, 20, 0),
                             datetime(2013, 8, 2, 10, 0),
                             datetime(2013, 9, 2, 12, 0),                                      
                             datetime(2013, 11, 2, 14, 0)]})
df = df.set_index('Date')
print(df)

#                     Buyer  Quantity
# Date                               
# 2013-01-01 13:00:00  Carl         1
# 2013-03-01 13:05:00  Mark         3
# 2013-05-01 20:00:00  Carl         5
# 2013-08-02 10:00:00   Joe         8
# 2013-09-02 12:00:00   Joe         9
# 2013-11-02 14:00:00  Carl         3

g1 = df.resample('MS')["Quantity"].count()
print(g1)

# Date
# 2013-01-01    1
# 2013-02-01    0
# 2013-03-01    1
# 2013-04-01    0
# 2013-05-01    1
# 2013-06-01    0
# 2013-07-01    0
# 2013-08-01    1
# 2013-09-01    1
# 2013-10-01    0
# 2013-11-01    1
# Freq: MS, Name: Quantity, dtype: int64

And this is how to get to the finish line:

g2 = g1.sort_index(ascending=False).rolling(2, 0).sum().sort_index()
print(g2)

# Date
# 2013-01-01    1.0
# 2013-02-01    1.0
# 2013-03-01    1.0
# 2013-04-01    1.0
# 2013-05-01    1.0
# 2013-06-01    0.0
# 2013-07-01    1.0
# 2013-08-01    2.0
# 2013-09-01    1.0
# 2013-10-01    1.0
# 2013-11-01    1.0
# Freq: MS, Name: Quantity, dtype: float64

print(g2[g1 != 0].astype(int))

# Date
# 2013-01-01    1
# 2013-03-01    1
# 2013-05-01    1
# 2013-08-01    2
# 2013-09-01    1
# 2013-11-01    1
# Name: Quantity, dtype: int64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM