I'm associating two dataframes to check if a visit happened.
def visiter(d,visits):
visit = visits[d.start_date:d.end_date]
out = visit[(visit['user_id'] == d.user_id)&(visit['merchant_id']==d.merchant_id)].head(1) #only take the first visit
return (out.index.date.astype('str'))
data['visited_at']= data.apply(lambda x: str(visiter(x,visits)),axis =1 )
The output of the above column is:
0 []
1 []
2 ['2017-04-24']
3 []
4 []
Name: visited_at, dtype: object
converting the column using pd.to_datetime(data.visited_at, errors = 'coerce')
, makes the entire column NAT
.
Is there any changes to the code I could to get the datetime in the correct format like the following : 2017-05-01 00:00:00
Edit1: The dataframe looks like the following:
Index id user_id merchant_id marketing_email_id start_date end_date email_status sms_status created_at visited_at
0 68989 68990 13277 38 437 2016-04-11 00:00:00 2016-04-16 00:00:00 1 NaN 2016-04-11 11:05:31 []
1 403557 403558 195246 179 2218 2017-06-09 00:00:00 2017-06-12 00:00:00 0 1 2017-06-09 06:01:04 []
2 333381 333382 127359 514 1820 2017-04-24 00:00:00 2017-05-01 00:00:00 0 1 2017-04-24 10:00:33 ['2017-04-24']
3 511815 511816 151653 259 1136 2017-08-05 00:00:00 2017-08-08 00:00:00 0 1 2017-08-05 11:31:19 []
4 167172 167173 51546 32 363 2016-08-05 00:00:00 2016-08-15 00:00:00 1 NaN 2016-08-05 12:00:43 []
You need remove []
by strip
:
pd.to_datetime(data['visited_at'].str.strip('[]'), errors = 'coerce')
data['visited_at'] = pd.to_datetime(data['visited_at'].str.strip('[]'), errors = 'coerce')
print (data)
Index id user_id merchant_id \
0 68989 68990 13277 38 437 2016-04-11 00:00:00
1 403557 403558 195246 179 2218 2017-06-09 00:00:00
2 333381 333382 127359 514 1820 2017-04-24 00:00:00
3 511815 511816 151653 259 1136 2017-08-05 00:00:00
4 167172 167173 51546 32 363 2016-08-05 00:00:00
marketing_email_id start_date end_date email_status \
0 68989 68990 13277 2016-04-16 00:00:00 1 NaN
1 403557 403558 195246 2017-06-12 00:00:00 0 1.0
2 333381 333382 127359 2017-05-01 00:00:00 0 1.0
3 511815 511816 151653 2017-08-08 00:00:00 0 1.0
4 167172 167173 51546 2016-08-15 00:00:00 1 NaN
sms_status created_at visited_at
0 68989 68990 13277 2016-04-11 11:05:31 NaT
1 403557 403558 195246 2017-06-09 06:01:04 NaT
2 333381 333382 127359 2017-04-24 10:00:33 2017-04-24
3 511815 511816 151653 2017-08-05 11:31:19 NaT
4 167172 167173 51546 2016-08-05 12:00:43 NaT
Another solution is check if not empty df with select first value by [0]
with if-else
:
def visiter(d,visits):
visit = visits[d.start_date:d.end_date]
out=visit[(visit['user_id'] == d.user_id)&(visit['merchant_id']==d.merchant_id)].head(1)
out = np.nan if out.empty else out.index.date[0]
return (out)
Are the values in visited_at
actually lists ( []
), or string representations of lists ( '[]'
)?
If lists, you can use apply
:
visited_at.apply(lambda x: pd.to_datetime(x[0]) if len(x) else x)
If strings of lists, you can hack it with:
visited_at.apply(lambda x: pd.to_datetime(x[1:-1]) if len(x)>2 else x)
Either way you get:
0 []
1 []
2 2017-04-24 00:00:00
3 []
4 []
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.