I have a dataset where each observation has a Date
. Then I have a list of events. I want to filter the dataset and keep observations only if the date is within +/- 30 days of an event. Also, I want to know which event it is closest to.
For example, the main dataset looks like:
Product Date
Chicken 2008-09-08
Pork 2008-08-22
Beef 2008-08-15
Rice 2008-07-22
Coke 2008-04-05
Cereal 2008-04-03
Apple 2008-04-02
Banana 2008-04-01
It is generated by
d = {'Product': ['Apple', 'Banana', 'Cereal', 'Coke', 'Rice', 'Beef', 'Pork', 'Chicken'],
'Date': ['2008-04-02', '2008-04-01', '2008-04-03', '2008-04-05',
'2008-07-22', '2008-08-15', '2008-08-22', '2008-09-08']}
df = pd.DataFrame(data = d)
df['Date'] = pd.to_datetime(df['Date'])
Then I have a column of events:
Date
2008-05-03
2008-07-20
2008-09-01
generated by
event = pd.DataFrame({'Date': pd.to_datetime(['2008-05-03', '2008-07-20', '2008-09-01'])})
GOAL (EDITED)
I want to keep the rows in df
only if df['Date']
is within a month of event['Date']
. For example, the first event occurred on 2008-05-03, so I want to keep observations between 2008-04-03 and 2008-06-03, and also create a new column to tell this observation is closest to the event on 2008-05-03.
Product Date Event
Chicken 2008-09-08 2008-09-01
Pork 2008-08-22 2008-09-01
Beef 2008-08-15 2008-07-20
Rice 2008-07-22 2008-07-20
Coke 2008-04-05 2008-05-03
Cereal 2008-04-03 2008-05-03
Use numpy
broadcast and assumed within 30 days
df[np.any(np.abs(df.Date.values[:,None]-event.Date.values)/np.timedelta64(1,'D')<31,1)]
Out[90]:
Product Date
0 Chicken 2008-09-08
1 Pork 2008-08-22
2 Beef 2008-08-15
3 Rice 2008-07-22
4 Coke 2008-04-05
5 Cereal 2008-04-03
event['eDate'] = event.Date
df = pd.merge_asof(df.sort_values('Date'), event.sort_values('Date'), on="Date", direction='nearest')
df[(df.Date - df.eDate).abs() <= '30 days']
I would use listcomp with intervalindex
ms = pd.offsets.MonthOffset(1)
e1 = event.Date - ms
e2 = event.Date + ms
iix = pd.IntervalIndex.from_arrays(e1, e2, closed='both')
df.loc[[any(d in i for i in iix) for d in df.Date]]
Out[93]:
Product Date
2 Cereal 2008-04-03
3 Coke 2008-04-05
4 Rice 2008-07-22
5 Beef 2008-08-15
6 Pork 2008-08-22
7 Chicken 2008-09-08
If it just months irrespective of dates, this may be useful.
rng=[]
for a, b in zip (event['Date'].dt.month-1, event['Date'].dt.month+1):
rng = rng + list(range(a-1,b+1,1))
df[df['Date'].dt.month.isin(set(rng))]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.