简体   繁体   English

根据日期范围中的DateTimeIndex更新列

[英]Update column based on DateTimeIndex from date range

I have a Pandas dataframe with a DateTimeIndex and an empty column called HOLIDAY. 我有一个带有DateTimeIndex的熊猫数据框和一个称为HOLIDAY的空列。

I want to set the value of that column to 'YES' if the datetime in the index is on a holiday, so that the resulting dataframe is like this: 如果索引中的日期时间是放假,我想将该列的值设置为“ YES”,以便得到的数据框如下所示:

TIME                    HOLIDAY
2019-11-25 06:00:00     NO
2019-11-26 21:00:00     NO
2019-11-27 18:00:00     NO
2019-11-28 08:00:00     YES
2019-11-29 08:00:00     NO
2019-11-30 08:00:00     NO

I have a list of dates: 我有一个日期列表:

holidays = ['2019-07-04', '2019-11-28','2019-12-25']
holidays = pd.to_datetime(holidays)

I tried this, but I get an error: 我试过了,但是出现错误:

df.loc[df.index.date.isin(holidays), 'HOLIDAY'] = "YES"

What's the best way to achieve this? 实现此目标的最佳方法是什么?

Thank you 谢谢

DateTimeIndex.date returns a numpy array of strings not pd.Series of pd.Timestamp dtype. DateTimeIndex.date返回一个numpy字符串数组,而不是pd.Timedamp dtype系列的pd。 So you must get the same dtype on both sides of the equality: 因此,必须在等式两边都获得相同的dtype:

If TIME is not in your index this will work: 如果TIME不在您的索引中,它将起作用:

m2 = df['TIME'].dt.date.isin(holidays.date)

or 要么

m2 = df.index.to_series().dt.date.isin(holidays.date)

df.loc[m2, 'HOLIDAY'] = "YES"

Output: 输出:

                    HOLIDAY
TIME                       
2019-11-25 06:00:00      NO
2019-11-26 21:00:00      NO
2019-11-27 18:00:00      NO
2019-11-28 08:00:00     YES
2019-11-29 08:00:00      NO
2019-11-30 08:00:00      NO

Note that: 注意:

  • holidays contains ao 2019-11-28 at midnight , holidays包含ao 2019-11-28 在午夜
  • your DataFrame contains ao also 2019-11-28 , but at 8:00 . 您的DataFrame还包含2019-11-28 ,但在8:00

If you want to find rows with index values in holidays dates (regardless of the time part), you have to "nullify" the time part. 如果要查找holidays日期中具有索引值的行(与时间部分无关),则必须“无效”时间部分。

One of methods to get the rows in question is to use boolean indexing : 获取有问题的行的方法之一是使用布尔索引

df[df.index.floor('D').isin(holidays)]

The result is: 结果是:

                    HOLIDAY
TIME                       
2019-11-28 08:00:00     YES

You can also get only HOLIDAY column, running: 您还可以只获取HOLIDAY列,运行:

df[df.index.floor('D').isin(holidays)].HOLIDAY

This time the result ( Series ) is: 这次的结果( 系列 )为:

TIME
2019-11-28 08:00:00    YES
Name: HOLIDAY, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM