简体   繁体   中英

Holiday Calendar in pandas DataFrame

I created a Holiday calendar for Germany (not all days included) as followed:

from pandas.tseries.holiday import Holiday,AbstractHolidayCalendar

class GermanHolidays(AbstractHolidayCalendar):
    rules = [Holiday('New Years Day', month=1, day=1),
             Holiday('First of May', month=5, day=1),
             Holiday('German Unity Day', month=10,day=3),
            ...]

cal = GermanHolidays()

Now I want that a column displays when a holiday appears or not with ("1" or "0"). So I did the following:

holidays = cal.holidays(start=X['Time (CET)'].min(), end = X['Time (CET)'].max())
X['Holidays'] = X['Time (CET)'].isin(holidays)
X['Holidays'] = X['Holidays'].astype(float)

X is a dataframe where Time (CET) is column in the format %d.%m.%Y %H:%M:%S . Unfortunately this is not working. There is no error raised but all columns are marked with "0" . So there is no matching happening and I really dont know why. I thought that it is maybe because the frequency of holidays is daily and not hourly as it is in the column Time (CET) . Would be great if you could help me! Thank you!

There might be a few reasons for that.

One of them as mentioned by @unutbu - is a wrong (string) dtype . Make sure your X['Time (CET)'] column is of datetime dtype. This can be done as follows:

X['Time (CET)'] = pd.to_datetime(X['Time (CET)'], dayfirst=True, errors='coerce')

Another reason as you said is the time part.

Here is a demo:

In [28]: df = pd.DataFrame({'Date':pd.date_range('2017-01-01 01:01:01', 
                                                 freq='9H', periods=1000)})

yields:

In [30]: df
Out[30]:
                   Date
0   2017-01-01 01:01:01
1   2017-01-01 10:01:01
2   2017-01-01 19:01:01
3   2017-01-02 04:01:01
4   2017-01-02 13:01:01
5   2017-01-02 22:01:01
6   2017-01-03 07:01:01
7   2017-01-03 16:01:01
8   2017-01-04 01:01:01
9   2017-01-04 10:01:01
..                  ...
990 2018-01-07 07:01:01
991 2018-01-07 16:01:01
992 2018-01-08 01:01:01
993 2018-01-08 10:01:01
994 2018-01-08 19:01:01
995 2018-01-09 04:01:01
996 2018-01-09 13:01:01
997 2018-01-09 22:01:01
998 2018-01-10 07:01:01
999 2018-01-10 16:01:01

[1000 rows x 1 columns]

filtering by holidays isn't working because of not matching time part:

In [29]: df.loc[df.Date.isin(holidays)]
Out[29]:
Empty DataFrame
Columns: [Date]
Index: []

We can make it working by normalizing (truncate time part or set time to 00:00:00 ) our datetime column:

In [31]: df.loc[df.Date.dt.normalize().isin(holidays)]
Out[31]:
                   Date
0   2017-01-01 01:01:01
1   2017-01-01 10:01:01
2   2017-01-01 19:01:01
320 2017-05-01 01:01:01
321 2017-05-01 10:01:01
322 2017-05-01 19:01:01
734 2017-10-03 07:01:01
735 2017-10-03 16:01:01

This is basically what you already have. Given that this works and yours doesn't, it is likely because the values are text instead of timestamps as noted already by @unutbu and @MaxU.

Also, your post states:

displays when a holiday appears or not with ("1" or "0")

Did you really want a text value? You tried to convert to floats, but you probably just want integers.

X = pd.DataFrame({'Time (CET)': pd.DatetimeIndex(start='2017-01-01', end='2017-12-31', freq='12H')})
X = X.assign(Holidays=X['Time (CET)'].isin(cal.holidays()).astype(int))
>>> X
             Time (CET)  Holidays
0   2017-01-01 00:00:00         1
1   2017-01-01 12:00:00         0
2   2017-01-02 00:00:00         0
...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM