简体   繁体   中英

How to make pandas' timedeltas timezone aware?

If I do

import pandas as pd
pd.to_datetime("2020-03-08") + pd.to_timedelta('1D')

I get Timestamp('2020-03-09 00:00:00') as expected.

But when I try with a timezone aware datatype..

pd.to_datetime("2020-03-08").tz_localize('America/New_York') + pd.to_timedelta('1D')

I get Timestamp('2020-03-09 01:00:00-0400', tz='America/New_York') which is one hour after midnight.

This actually makes sense when you realise that 2020-03-08 is the day the clocks move forward for daylight savings time, and the day is only 23 hours long. But I have a use case where I want a time delta that is always one "local time" day long.

So is there a way of creating a "local time aware" timedelta object so that '1D' represents a calendar day whether the day is 23, 24 or 25 hours long?

What you could do is compare the .dst() attributes of the timestamps and adjust by 1 hour if a DST transition falls in between. You will also have to catch the case where adding the timedelta would cause the resulting timestamp to fall exactly on an hour that is non-existent in the timezone.

import pandas as pd
import pytz

def account_for_dst(t0, t1):
    """
    adjust the timedelta between two timezone-aware timestamps t0 and t1
    for DST transitions.
    """
    # check if time delta would fall exactly on a DST transition:
    dt = t1-t0
    try:
        _ = (t0.tz_localize(None)+dt).tz_localize(t0.tz)
    except pytz.NonExistentTimeError:
        return t0, t1 # t0 and t1 not modified...
    
    # otherwise, adjust the time delta...
    else:
        if t0.dst() > t1.dst():
            t1 += pd.to_timedelta('1H')
        elif t0.dst() < t1.dst():    
            t1 -= pd.to_timedelta('1H')
        return t0, t1

That would give exemplary results like

times = ("2020-3-7 02:00", "2020-3-8 00:00", "2020-11-1 00:00")

for t in times:
    t0 = pd.to_datetime(t).tz_localize('America/New_York')
    t1 = t0 + pd.to_timedelta('1D')
    print(f"before: {str(t0), str(t1)}")
    t0, t1 = account_for_dst(t0, t1)
    print(f"after: {str(t0), str(t1)}\n")   
    
# before: ('2020-03-07 02:00:00-05:00', '2020-03-08 03:00:00-04:00')
# after: ('2020-03-07 02:00:00-05:00', '2020-03-08 03:00:00-04:00')

# before: ('2020-03-08 00:00:00-05:00', '2020-03-09 01:00:00-04:00')
# after: ('2020-03-08 00:00:00-05:00', '2020-03-09 00:00:00-04:00')

# before: ('2020-11-01 00:00:00-04:00', '2020-11-01 23:00:00-05:00')
# after: ('2020-11-01 00:00:00-04:00', '2020-11-02 00:00:00-05:00') 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM