[英]Pandas groupby: Count the number of occurrences within a time range for each group
[英]Pandas filter by date range within each group
df = pd.DataFrame({
'Id': np.repeat([2, 3, 4], [4, 3, 4]),
'Date': ['12/31/2019', '1/1/2020', '1/5/2020', '1/20/2020',
'1/5/2020', '1/10/2020', '1/30/2020', '2/2/2020',
'2/4/2020', '2/10/2020', '2/25/2020'],
'Value': [*'abcbdeefffg']
})
首先,使用to_datetime
将Date
转换为Timestamp
df['Date'] = pd.to_datetime(df['Date'])
concat
groupby
pd.concat([
d[d.Date <= d.Date.min() + pd.offsets.Day(14)]
for _, d in df.groupby('Id')
])
Id Date Value
0 2 2019-12-31 a
1 2 2020-01-01 b
2 2 2020-01-05 c
4 3 2020-01-05 d
5 3 2020-01-10 e
7 4 2020-02-02 f
8 4 2020-02-04 f
9 4 2020-02-10 f
groupby
df[df.Date <= df.Id.map(df.groupby('Id').Date.min() + pd.offsets.Day(14))]
Id Date Value
0 2 2019-12-31 a
1 2 2020-01-01 b
2 2 2020-01-05 c
4 3 2020-01-05 d
5 3 2020-01-10 e
7 4 2020-02-02 f
8 4 2020-02-04 f
9 4 2020-02-10 f
我与pandas.concat
斗争,所以你可以尝试使用merge
:
# Convert Date to datetime
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y')
# Get min Date for each Id and add two weeks (14 days)
s = df.groupby('Id')['Date'].min() + pd.offsets.Day(14)
# Merge df and s
df = df.merge(s, left_on='Id', right_index=True)
# Keep records where Date is less than the allowed limit
df = df.loc[df['Date_x'] <= df['Date_y'], ['Id','Date_x','Value']]
# Rename Date_x to Date (optional)
df.rename(columns={'Date_x':'Date'}, inplace=True)
结果是:
Id Date Value
0 2 2019-12-31 a
1 2 2020-01-01 b
2 2 2020-01-05 c
4 3 2020-01-05 d
5 3 2020-01-10 e
7 4 2020-02-02 f
8 4 2020-02-04 f
9 4 2020-02-10 f
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.