[英]datetime conversion ValueError Pandas
I have a dataset with wrong times (24:00:00 to 26:18:00) I wanted to know what is the best approach to deal with this kind of data in python.我有一个时间错误的数据集(24:00:00 到 26:18:00) 我想知道在 python 中处理此类数据的最佳方法是什么。
I tried to convert the column from object to datetime
using this code:我尝试使用以下代码将列从 object 转换为datetime
时间:
stopTimeArrDep['departure_time'] = pd.to_datetime(stopTimeArrDep['departure_time']\
,format='%H:%M:%S')
But I get this error:但我得到这个错误:
ValueError: time data '24:04:00' does not match format '%H:%M:%S' (match)
So I tried adding errors='coerce'
to avoid this error.所以我尝试添加errors='coerce'
来避免这个错误。 But I end up with empty columns and unwanted date added to every row.但我最终得到了空列和添加到每一行的不需要的日期。
stopTimeArrDep['departure_time'] = pd.to_datetime(stopTimeArrDep['departure_time']\
,format='%H:%M:%S',errors='coerce')
output sample: output 样品:
original_col converted_col
23:45:00 1/1/00 23:45:00
23:51:00 1/1/00 23:51:00
24:04:00
23:42:00 1/1/00 23:42:00
26:01:00
Any suggestion on what is the best approach to handle this issue.关于处理此问题的最佳方法的任何建议。 Thank you,谢谢,
You could treat the original_col
as some elapsed time interval and not time, if that makes any sense.如果有任何意义,您可以将original_col
视为经过的时间间隔而不是时间。 You could use datetime.timedelta
and then add this datetime.timedelta
to a datetime.datetime
to get some datetime object;您可以使用datetime.timedelta
然后将此datetime.timedelta
添加到datetime.datetime
以获得一些日期时间 object; which you could finally use to get the date and time separately.您最终可以使用它来分别获取日期和时间。
from datetime import datetime, timedelta
time_string = "20:30:20"
t = datetime.utcnow()
print('t: {}'.format(t))
HH, MM, SS = [int(x) for x in time_string.split(':')]
dt = timedelta(hours=HH, minutes=MM, seconds=SS)
print('dt: {}'.format(dt))
t2 = t + dt
print('t2: {}'.format(t2))
print('t2.date: {} | t2.time: {}'.format(str(t2.date()), str(t2.time()).split('.')[0]))
Output : Output :
t: 2019-10-24 04:43:08.255027
dt: 20:30:20
t2: 2019-10-25 01:13:28.255027
t2.date: 2019-10-25 | t2.time: 01:13:28
For Your Usecase为您的用例
# Define Custom Function
def process_row(time_string):
HH, MM, SS = [int(x) for x in time_string.split(':')]
dt = timedelta(hours=HH, minutes=MM, seconds=SS)
return dt
# Make Dummy Data
original_col = ["23:45:00", "23:51:00", "24:04:00", "23:42:00", "26:01:00"]
df = pd.DataFrame({'original_col': original_col, 'dt': None})
# Process Dataframe
df['dt'] = df.apply(lambda x: process_row(x['original_col']), axis=1)
df['t'] = datetime.utcnow()
df['t2'] = df['dt'] + df['t']
# extracting date from timestamp
df['Date'] = [datetime.date(d) for d in df['t2']]
# extracting time from timestamp
df['Time'] = [datetime.time(d) for d in df['t2']]
df
pandas.to_datetime()
:使用pandas.to_datetime()
:pd.to_datetime(df['t2'], format='%H:%M:%S',errors='coerce')
Output : Output :
0 2019-10-25 09:38:39.349410
1 2019-10-25 09:44:39.349410
2 2019-10-25 09:57:39.349410
3 2019-10-25 09:35:39.349410
4 2019-10-25 11:54:39.349410
Name: t2, dtype: datetime64[ns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.