简体   繁体   中英

Get correct datetime object from dataframe column with random string present with date and time

I have dataframe like this:

       id                   Time
0      N01  Thu Sep 10 11:44:30 XYZ 2020
1      V33  Thu Sep 10 11:39:05 ABC 2020
2      N01  Thu Sep 10 11:44:30 XYZ 2020

I am trying to convert Time column to datetime object. If I'm using:

df1['Time'] = pd.to_datetime(df1['Time'])

It is throwing a warning message:

UnknownTimezoneWarning: tzname BRT identified but not understood.  Pass `tzinfos` argument in order to correctly return a timezone-aware datetime.  In a future version, this will raise an exception.
  category=UnknownTimezoneWarning)

I am aware that there is a format argument in pd.to_datetime() to pass the input format. But I don't know what to pass as format to bypass the random strings in the middle of the Time column.

Is there any way to correctly get the datetime object from the Time column so that the random strings don't have any effect?

If you the characters you wants to remove are some following upper cases, you can handle it with a regex function with remove followed uppercase:

import pandas as pd

data={'id':['N01','V33','N01'],
      'time':['Thu Sep 10 11:44:30 XYZ 2020','Thu Sep 10 11:39:05 ABC 2020','Thu Sep 10 11:44:30 XYZ 2020']}


df = pd.DataFrame(data)
df['time']=pd.to_datetime(df['time'].str.replace('([A-Z].[A-Z])',''),format=r'%a %b %d %H:%M:%S  %Y')
print(df)

result:

    id                time
0  N01 2020-09-10 11:44:30
1  V33 2020-09-10 11:39:05
2  N01 2020-09-10 11:44:30

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM