简体   繁体   中英

How to remove a specific holiday from Pandas USFederalHolidayCalendar?

I'm trying to remove Columbus Day from pandas.tseries.holiday.USFederalHolidayCalendar .

This seems to be possible, as a one-time operation, with

from pandas.tseries.holiday import USFederalHolidayCalendar
cal = USFederalHolidayCalendar()
cal = cal.rules.pop(6)

However, if this code is within a function that gets called repeatedly (in a loop) to generate several independent outputs, I get the following error:

IndexError: pop index out of range

It gives me the impression that the object remains in its initial loaded state and as the loop progresses it pops holidays at index 6 until they're gone and then throws an error.

I tried reloading via importlib.reload to no avail.

Any idea what I'm doing wrong?

The problem here is that rules is a class attribute (a list of objects). See the code taken from here :

class USFederalHolidayCalendar(AbstractHolidayCalendar):
    """
    US Federal Government Holiday Calendar based on rules specified by:
    https://www.opm.gov/policy-data-oversight/
       snow-dismissal-procedures/federal-holidays/
    """

    rules = [
        Holiday("New Years Day", month=1, day=1, observance=nearest_workday),
        USMartinLutherKingJr,
        USPresidentsDay,
        USMemorialDay,
        Holiday("July 4th", month=7, day=4, observance=nearest_workday),
        USLaborDay,
        USColumbusDay,
        Holiday("Veterans Day", month=11, day=11, observance=nearest_workday),
        USThanksgivingDay,
        Holiday("Christmas", month=12, day=25, observance=nearest_workday),
    ]

Since the attribute is defined on the class, there is only one underlying list referred to, so if operations on different instances of that class both attempt to edit the list, then you'll have some unwanted behavior. Here is an example that shows what's going on:

>>> class A:
...     rules = [0,1,2]
... 
>>> a1 = A()
>>> a2 = A()
>>> a1.rules.pop()
2
>>> a1.rules.pop()
1
>>> a2.rules.pop()
0
>>> a2.rules.pop()
IndexError: pop from empty list
>>> a3 = A()
>>> a3.rules
[]

Also, each module in python is imported only one time

# Import your library
from pandas.tseries.holiday import USFederalHolidayCalendar

# Get an id of 'columbus' in 'rules' list
columbus_index = USFederalHolidayCalendar().rules.index([i for i in USFederalHolidayCalendar().rules if 'Columbus' in str(i)][0])

# Create your own class, inherit 'USFederalHolidayCalendar'
class USFederalHolidayCalendar(USFederalHolidayCalendar):
    # Exclude 'columbus' entry
    rules = USFederalHolidayCalendar().rules[:columbus_index] + USFederalHolidayCalendar().rules[columbus_index+1:]

# Create an object from your class
cal = USFederalHolidayCalendar()
print(cal.rules)
[Holiday: New Years Day (month=1, day=1, observance=<function nearest_workday at 0x7f6afad571f0>),
 Holiday: Martin Luther King Jr. Day (month=1, day=1, offset=<DateOffset: weekday=MO(+3)>),
 Holiday: Presidents Day (month=2, day=1, offset=<DateOffset: weekday=MO(+3)>),
 Holiday: Memorial Day (month=5, day=31, offset=<DateOffset: weekday=MO(-1)>),
 Holiday: July 4th (month=7, day=4, observance=<function nearest_workday at 0x7f6afad571f0>),
 Holiday: Labor Day (month=9, day=1, offset=<DateOffset: weekday=MO(+1)>),
 Holiday: Veterans Day (month=11, day=11, observance=<function nearest_workday at 0x7f6afad571f0>),
 Holiday: Thanksgiving (month=11, day=1, offset=<DateOffset: weekday=TH(+4)>),
 Holiday: Christmas (month=12, day=25, observance=<function nearest_workday at 0x7f6afad571f0>)]


[T]his code is within a function that gets called repeatedly (in a loop) to generate several independent outputs ... I tried reloading via importlib.reload to no avail.

If you really want to import and pop inside the function, reload the holiday module like so:

from importlib import reload

def f():
    from pandas.tseries import holiday

    # reload `holiday` and pop Columbus Day
    holiday = reload(holiday)
    cal = holiday.USFederalHolidayCalendar()
    cal.rules.pop(6) # as HenryEcker noted, do not assign back to `cal`

    # just for demo, print the first 3 letters per remaining holiday
    print([rule.name[:3] for rule in cal.rules])

for _ in range(10):
    f()

Just to show there's no Col umbus Day, the first 3 letters per remaining holiday are printed:

['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']
['New', 'Mar', 'Pre', 'Mem', 'Jul', 'Lab', 'Vet', 'Tha', 'Chr']

But if possible, just pass cal into the function, so you only have to generate and modify cal once:

# define the function to accept `cal` to avoid repeated importing/reloading
def f(cal):
    print([rule.name[:3] for rule in cal.rules])

# generate `cal` and pop once
from pandas.tseries.holiday import USFederalHolidayCalendar
cal = USFederalHolidayCalendar()
cal.rules.pop(6) # as HenryEcker noted, do not assign back to `cal`

for _ in range(10):
    f(cal)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM