I need to quickly turn an ISO 8601 datetime string--with no timezone in the string, but known to be in the US/Pacific timezone--into a numpy datetime64
object.
If my machine were in US/Pacific time, I could simply run numpy.datetime64(s)
. However, this assumes that strings without timezones are in the local timezone. Furthermore, I can't easily specify the US/Pacific timezone in ISO 8601 format, because it is sometimes -0800
and sometimes -0700
depending on daylight savings time.
So far, the fastest solution I have is numpy.datetime64(pandas.Timestamp(s).tz_localize(tz='US/Pacific', ambiguous=True))
. This takes 70µs on my machine. It would be good if I could get this at least an order of magnitude faster ( numpy.datetime64(s)
in local time takes 4 µs but is incorrect as described above). Is this possible?
First note that without the offset some localtimes and therefore their datetime strings are ambiguous. For example, the ISO 8601 datetime strings
2000-10-29T01:00:00-07:00
2000-10-29T01:00:00-08:00
both map to the same string 2000-10-29T01:00:00
when the offset is removed.
So it may not always be possible to reconstitute a unique timezone-aware datetime from a datetime string without offset.
However, we could make a choice in these ambigous situations and accept that not all ambiguous dates will be correctly converted.
If you are using Unix, you can use time.tzset to change the process's local timezone:
import os
import time
os.environ['TZ'] = tz
time.tzset()
You could then convert the datetime strings to NumPy datetime64's using
def using_tzset(date_strings, tz):
os.environ['TZ'] = tz
time.tzset()
return np.array(date_strings, dtype='datetime64[ns]')
Note however that using_tzset
does not always produce the same value as the method you proposed:
import os
import time
import numpy as np
import pandas as pd
tz = 'US/Pacific'
N = 10**5
dates = pd.date_range('2000-1-1', periods=N, freq='H', tz=tz)
date_strings_tz = dates.format(formatter=lambda x: x.isoformat())
date_strings = [d.rsplit('-', 1)[0] for d in date_strings_tz]
def orig(date_strings, tz):
return [np.datetime64(pd.Timestamp(s, tz=tz)) for s in date_strings]
def using_tzset(date_strings, tz):
os.environ['TZ'] = tz
time.tzset()
return np.array(date_strings, dtype='datetime64[ns]')
npdates = dates.asi8.view('datetime64[ns]')
x = np.array(orig(date_strings, tz))
y = using_tzset(date_strings, tz)
df = pd.DataFrame({'dates': npdates, 'str': date_strings_tz, 'orig': x, 'using_tzset': y})
This indicates that the original method, orig
, fails to recover the original date 172 times:
print((df['dates'] != df['orig']).sum())
172
while using_tzset
fails 11 times:
print((df['dates'] != df['using_tzset']).sum())
11
Note however, that the 11 times that using_tzset
fails are due to the ambiguity in local datetimes due to DST.
This shows some of the discrepancies:
mask = df['dates'] != df['using_tzset']
idx = np.where(mask.shift(1) | mask)[0]
print(df[['dates', 'str', 'using_tzset']].iloc[idx]).head(6)
# dates str using_tzset
# 7248 2000-10-29 08:00:00 2000-10-29T01:00:00-07:00 2000-10-29 08:00:00
# 7249 2000-10-29 09:00:00 2000-10-29T01:00:00-08:00 2000-10-29 08:00:00
# 15984 2001-10-28 08:00:00 2001-10-28T01:00:00-07:00 2001-10-28 08:00:00
# 15985 2001-10-28 09:00:00 2001-10-28T01:00:00-08:00 2001-10-28 08:00:00
# 24720 2002-10-27 08:00:00 2002-10-27T01:00:00-07:00 2002-10-27 08:00:00
# 24721 2002-10-27 09:00:00 2002-10-27T01:00:00-08:00 2002-10-27 08:00:00
As you can see the discrepancies occur when the date strings in the str
column become ambiguous when the offset is removed.
So using_tzset
appears to produce the correct result up to ambiguous datetimes.
Here is a timeit benchmark comparing orig
and using_tzset
:
In [95]: %timeit orig(date_strings, tz)
1 loops, best of 3: 5.43 s per loop
In [96]: %timeit using_tzset(date_strings, tz)
10 loops, best of 3: 41.7 ms per loop
So using_tzset
is over 100x faster than orig
when N = 10**5.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.