[英]Get correct datetime object from dataframe column with random string present with date and time
I have dataframe like this:我有这样的数据框:
id Time
0 N01 Thu Sep 10 11:44:30 XYZ 2020
1 V33 Thu Sep 10 11:39:05 ABC 2020
2 N01 Thu Sep 10 11:44:30 XYZ 2020
I am trying to convert Time
column to datetime object.我正在尝试将
Time
列转换为日期Time
对象。 If I'm using:如果我使用:
df1['Time'] = pd.to_datetime(df1['Time'])
It is throwing a warning message:它正在抛出警告消息:
UnknownTimezoneWarning: tzname BRT identified but not understood. Pass `tzinfos` argument in order to correctly return a timezone-aware datetime. In a future version, this will raise an exception.
category=UnknownTimezoneWarning)
I am aware that there is a format
argument in pd.to_datetime()
to pass the input format.我知道
pd.to_datetime()
中有一个format
参数来传递输入格式。 But I don't know what to pass as format
to bypass the random strings in the middle of the Time
column.但我不知道传递什么
format
来绕过Time
列中间的随机字符串。
Is there any way to correctly get the datetime object from the Time
column so that the random strings don't have any effect?有什么方法可以从
Time
列正确获取 datetime 对象,以便随机字符串没有任何影响?
If you the characters you wants to remove are some following upper cases, you can handle it with a regex function with remove followed uppercase:如果您要删除的字符是以下一些大写字母,则可以使用 remove 后跟大写字母的正则表达式函数来处理它:
import pandas as pd将熊猫导入为 pd
data={'id':['N01','V33','N01'],
'time':['Thu Sep 10 11:44:30 XYZ 2020','Thu Sep 10 11:39:05 ABC 2020','Thu Sep 10 11:44:30 XYZ 2020']}
df = pd.DataFrame(data)
df['time']=pd.to_datetime(df['time'].str.replace('([A-Z].[A-Z])',''),format=r'%a %b %d %H:%M:%S %Y')
print(df)
result:结果:
id time
0 N01 2020-09-10 11:44:30
1 V33 2020-09-10 11:39:05
2 N01 2020-09-10 11:44:30
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.