How do adjust the starting time in Grouper?
Starting with this sample DF:
import datetime as DT
df = pd.DataFrame({
'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(),
'Quantity': [1,3,5,8,9,3],
'Date' : [
DT.datetime(2013,1,1,13,0),
DT.datetime(2013,3,1,13,5),
DT.datetime(2013,5,1,20,0),
DT.datetime(2013,8,2,10,0),
DT.datetime(2013,9,2,12,0),
DT.datetime(2013,11,2,14,0),
]})
df = df.set_index('Date')
df.groupby(pd.Grouper(freq='1MS'))["Quantity"].count()
Date
2013-01-01 1
2013-02-01 0
2013-03-01 1
2013-04-01 0
2013-05-01 1
2013-06-01 0
2013-07-01 0
2013-08-01 1
2013-09-01 1
2013-10-01 0
2013-11-01 1
df.groupby(pd.Grouper(freq='2MS'))["Quantity"].count()
Date
2013-01-01 1
2013-03-01 1
2013-05-01 1
2013-07-01 1
2013-09-01 1
2013-11-01 1
What I was looking for is "2MS"
from index date using Grouper or TimeGrouper
. The above is returning "2MS"
from first value in the index or 1/1/2013. How do I get 2MS
from '8/1/2013'
for 2.
Targeting:
Date
2013-01-01 1
2013-03-01 1
2013-05-01 1
2013-08-01 2
2013-09-01 1
2013-11-01 1
Notes:
What I'm trying to do groupby's based on index values.. -- 1st groupby would start slice from 1/1. The 2nd slice would start from 3/1, the 3rd from 5/1. The end period would be 2MS. Now using Grouper, it starts the slicing from the first date and continues in two month intervals. The fourth interval should start on 8/1 end 10/2. Right now, 8/2 starts on 7/1.
You want a forward rolling window while pandas
makes backwards rolling windows. So the idea is to reversed the ordering of your series, take a rolling window and then revert the ordering.
This is what you already had:
from datetime import datetime
import pandas as pd
df = pd.DataFrame({'Buyer': 'Carl Mark Carl Joe Joe Carl'.split(),
'Quantity': [1, 3, 5, 8, 9, 3],
'Date' : [datetime(2013, 1, 1, 13, 0),
datetime(2013, 3, 1, 13, 5),
datetime(2013, 5, 1, 20, 0),
datetime(2013, 8, 2, 10, 0),
datetime(2013, 9, 2, 12, 0),
datetime(2013, 11, 2, 14, 0)]})
df = df.set_index('Date')
print(df)
# Buyer Quantity
# Date
# 2013-01-01 13:00:00 Carl 1
# 2013-03-01 13:05:00 Mark 3
# 2013-05-01 20:00:00 Carl 5
# 2013-08-02 10:00:00 Joe 8
# 2013-09-02 12:00:00 Joe 9
# 2013-11-02 14:00:00 Carl 3
g1 = df.resample('MS')["Quantity"].count()
print(g1)
# Date
# 2013-01-01 1
# 2013-02-01 0
# 2013-03-01 1
# 2013-04-01 0
# 2013-05-01 1
# 2013-06-01 0
# 2013-07-01 0
# 2013-08-01 1
# 2013-09-01 1
# 2013-10-01 0
# 2013-11-01 1
# Freq: MS, Name: Quantity, dtype: int64
And this is how to get to the finish line:
g2 = g1.sort_index(ascending=False).rolling(2, 0).sum().sort_index()
print(g2)
# Date
# 2013-01-01 1.0
# 2013-02-01 1.0
# 2013-03-01 1.0
# 2013-04-01 1.0
# 2013-05-01 1.0
# 2013-06-01 0.0
# 2013-07-01 1.0
# 2013-08-01 2.0
# 2013-09-01 1.0
# 2013-10-01 1.0
# 2013-11-01 1.0
# Freq: MS, Name: Quantity, dtype: float64
print(g2[g1 != 0].astype(int))
# Date
# 2013-01-01 1
# 2013-03-01 1
# 2013-05-01 1
# 2013-08-01 2
# 2013-09-01 1
# 2013-11-01 1
# Name: Quantity, dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.