I created a Holiday calendar for Germany (not all days included) as followed:
from pandas.tseries.holiday import Holiday,AbstractHolidayCalendar
class GermanHolidays(AbstractHolidayCalendar):
rules = [Holiday('New Years Day', month=1, day=1),
Holiday('First of May', month=5, day=1),
Holiday('German Unity Day', month=10,day=3),
...]
cal = GermanHolidays()
Now I want that a column displays when a holiday appears or not with ("1" or "0"). So I did the following:
holidays = cal.holidays(start=X['Time (CET)'].min(), end = X['Time (CET)'].max())
X['Holidays'] = X['Time (CET)'].isin(holidays)
X['Holidays'] = X['Holidays'].astype(float)
X is a dataframe where Time (CET)
is column in the format %d.%m.%Y %H:%M:%S
. Unfortunately this is not working. There is no error raised but all columns are marked with "0"
. So there is no matching happening and I really dont know why. I thought that it is maybe because the frequency of holidays is daily and not hourly as it is in the column Time (CET)
. Would be great if you could help me! Thank you!
There might be a few reasons for that.
One of them as mentioned by @unutbu - is a wrong (string) dtype . Make sure your X['Time (CET)']
column is of datetime
dtype. This can be done as follows:
X['Time (CET)'] = pd.to_datetime(X['Time (CET)'], dayfirst=True, errors='coerce')
Another reason as you said is the time part.
Here is a demo:
In [28]: df = pd.DataFrame({'Date':pd.date_range('2017-01-01 01:01:01',
freq='9H', periods=1000)})
yields:
In [30]: df
Out[30]:
Date
0 2017-01-01 01:01:01
1 2017-01-01 10:01:01
2 2017-01-01 19:01:01
3 2017-01-02 04:01:01
4 2017-01-02 13:01:01
5 2017-01-02 22:01:01
6 2017-01-03 07:01:01
7 2017-01-03 16:01:01
8 2017-01-04 01:01:01
9 2017-01-04 10:01:01
.. ...
990 2018-01-07 07:01:01
991 2018-01-07 16:01:01
992 2018-01-08 01:01:01
993 2018-01-08 10:01:01
994 2018-01-08 19:01:01
995 2018-01-09 04:01:01
996 2018-01-09 13:01:01
997 2018-01-09 22:01:01
998 2018-01-10 07:01:01
999 2018-01-10 16:01:01
[1000 rows x 1 columns]
filtering by holidays
isn't working because of not matching time part:
In [29]: df.loc[df.Date.isin(holidays)]
Out[29]:
Empty DataFrame
Columns: [Date]
Index: []
We can make it working by normalizing (truncate time part or set time to 00:00:00
) our datetime column:
In [31]: df.loc[df.Date.dt.normalize().isin(holidays)]
Out[31]:
Date
0 2017-01-01 01:01:01
1 2017-01-01 10:01:01
2 2017-01-01 19:01:01
320 2017-05-01 01:01:01
321 2017-05-01 10:01:01
322 2017-05-01 19:01:01
734 2017-10-03 07:01:01
735 2017-10-03 16:01:01
This is basically what you already have. Given that this works and yours doesn't, it is likely because the values are text instead of timestamps as noted already by @unutbu and @MaxU.
Also, your post states:
displays when a holiday appears or not with ("1" or "0")
Did you really want a text value? You tried to convert to floats, but you probably just want integers.
X = pd.DataFrame({'Time (CET)': pd.DatetimeIndex(start='2017-01-01', end='2017-12-31', freq='12H')})
X = X.assign(Holidays=X['Time (CET)'].isin(cal.holidays()).astype(int))
>>> X
Time (CET) Holidays
0 2017-01-01 00:00:00 1
1 2017-01-01 12:00:00 0
2 2017-01-02 00:00:00 0
...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.