简体   繁体   中英

Python, Pandas: tz_localize AmbiguousTimeError: Cannot infer dst time with non DST dates

I'm trying to import some timeseries data and convert it into UTC so I can merge it with another dataset. This data seems to have 24 hour data and doesn't have DST adjustments. This post gives a similar answer, but they simply drop the line. I need to shift it so I can merge it with my other data.

When I run my code:

df = pd.read_csv('http://rredc.nrel.gov/solar/old_data/nsrdb/1991-2010/data/hourly/{}/{}_{}_solar.csv'.format(723898,723898,1998), usecols=["YYYY-MM-DD", "HH:MM (LST)","Meas Glo (Wh/m^2)","Meas Dir (Wh/m^2)","Meas Dif (Wh/m^2)"])

def clean_time(obj):
    hour = int(obj[0:-3])
    hour = str(hour - 1)
    if len(str(hour)) == 2:
        return hour+":00"
    else:
        return "0" + hour + ":00"

df['HH:MM (LST)'] = df['HH:MM (LST)'].apply(clean_time)
df['DateTime'] = df['YYYY-MM-DD'] + " " + df['HH:MM (LST)']
df = df.set_index(pd.DatetimeIndex(df['DateTime']))
df.drop(["YYYY-MM-DD", "HH:MM (LST)",'DateTime'],axis=1,inplace=True)
df.index = df.index.tz_localize('US/Pacific', ambiguous='infer')

I get:

pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 1998-10-25 01:00:00 as there are no repeated times

If I leave ambiguous='raise' (the default), it gives me:

pytz.exceptions.NonExistentTimeError: 1998-04-05 02:00:00

So I'm stuck on either the start, or end of daylight savings time.

There's quite a few of these datasets (multiple sites over multiple years) I need to merge, so I'd prefer not to hand code specific hours to shift, but I'm still a novice and can't quite figure out my next steps here.

Appreciate the help!

Minimal reproduction scenario:

from datetime import datetime, timedelta

import pandas as pd


df = pd.DataFrame([[datetime(2019, 10, 27, 0) + timedelta(hours=i), i] for i in range(24)], columns=['dt', 'i']).set_index('dt')
df.index.tz_localize('Europe/Amsterdam', ambiguous='infer')

pytz.exceptions.AmbiguousTimeError: Cannot infer dst time from 2019-10-27 02:00:00 as there are no repeated times

Solution: manually specify which datetime objects must be considered DT (Daylight Time) or DST (Daylight Savings Time). See also: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.tz_localize.html

from datetime import datetime, timedelta
import numpy as np
import pandas as pd


df = pd.DataFrame([[datetime(2019, 10, 27, 0) + timedelta(hours=i), i] for i in range(24)], columns=['dt', 'i']).set_index('dt')
infer_dst = np.array([False] * df.shape[0])  # all False -> every row considered DT, alternative is True to indicate DST. The array must correspond to the iloc of df.index
df.index.tz_localize('Europe/Amsterdam', ambiguous=infer_dst)  # no error

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM