简体   繁体   中英

Converting a list of str of Datetime into date time

I'm associating two dataframes to check if a visit happened.

def visiter(d,visits):
    visit = visits[d.start_date:d.end_date]
    out = visit[(visit['user_id'] == d.user_id)&(visit['merchant_id']==d.merchant_id)].head(1) #only take the first visit
    return (out.index.date.astype('str'))

data['visited_at']= data.apply(lambda x: str(visiter(x,visits)),axis =1 )

The output of the above column is:

0                []
1                []
2    ['2017-04-24']
3                []
4                []
Name: visited_at, dtype: object

converting the column using pd.to_datetime(data.visited_at, errors = 'coerce') , makes the entire column NAT .

Is there any changes to the code I could to get the datetime in the correct format like the following : 2017-05-01 00:00:00

Edit1: The dataframe looks like the following:

Index   id  user_id merchant_id marketing_email_id  start_date  end_date    email_status    sms_status  created_at  visited_at
0   68989   68990   13277   38  437 2016-04-11 00:00:00 2016-04-16 00:00:00 1   NaN 2016-04-11 11:05:31 []
1   403557  403558  195246  179 2218    2017-06-09 00:00:00 2017-06-12 00:00:00 0   1   2017-06-09 06:01:04 []
2   333381  333382  127359  514 1820    2017-04-24 00:00:00 2017-05-01 00:00:00 0   1   2017-04-24 10:00:33 ['2017-04-24']
3   511815  511816  151653  259 1136    2017-08-05 00:00:00 2017-08-08 00:00:00 0   1   2017-08-05 11:31:19 []
4   167172  167173  51546   32  363 2016-08-05 00:00:00 2016-08-15 00:00:00 1   NaN 2016-08-05 12:00:43 []

You need remove [] by strip :

pd.to_datetime(data['visited_at'].str.strip('[]'), errors = 'coerce')

data['visited_at'] = pd.to_datetime(data['visited_at'].str.strip('[]'), errors = 'coerce')
print (data)

                        Index    id     user_id merchant_id  \
0 68989  68990  13277      38   437  2016-04-11    00:00:00   
1 403557 403558 195246    179  2218  2017-06-09    00:00:00   
2 333381 333382 127359    514  1820  2017-04-24    00:00:00   
3 511815 511816 151653    259  1136  2017-08-05    00:00:00   
4 167172 167173 51546      32   363  2016-08-05    00:00:00   

                       marketing_email_id start_date  end_date  email_status  \
0 68989  68990  13277          2016-04-16   00:00:00         1           NaN   
1 403557 403558 195246         2017-06-12   00:00:00         0           1.0   
2 333381 333382 127359         2017-05-01   00:00:00         0           1.0   
3 511815 511816 151653         2017-08-08   00:00:00         0           1.0   
4 167172 167173 51546          2016-08-15   00:00:00         1           NaN   

                        sms_status created_at visited_at  
0 68989  68990  13277   2016-04-11   11:05:31        NaT  
1 403557 403558 195246  2017-06-09   06:01:04        NaT  
2 333381 333382 127359  2017-04-24   10:00:33 2017-04-24  
3 511815 511816 151653  2017-08-05   11:31:19        NaT  
4 167172 167173 51546   2016-08-05   12:00:43        NaT  

Another solution is check if not empty df with select first value by [0] with if-else :

def visiter(d,visits):
    visit = visits[d.start_date:d.end_date]
    out=visit[(visit['user_id'] == d.user_id)&(visit['merchant_id']==d.merchant_id)].head(1)
    out = np.nan if out.empty else out.index.date[0]
    return (out)

Are the values in visited_at actually lists ( [] ), or string representations of lists ( '[]' )?

If lists, you can use apply :

visited_at.apply(lambda x: pd.to_datetime(x[0]) if len(x) else x)

If strings of lists, you can hack it with:

visited_at.apply(lambda x: pd.to_datetime(x[1:-1]) if len(x)>2 else x)

Either way you get:

0                     []
1                     []
2    2017-04-24 00:00:00
3                     []
4                     []

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM