繁体   English   中英

Pandas - 仅将日期和月份与日期时间进行比较?

[英]Pandas - compare day and month only against a datetime?

我想将timestamp数据类型datetime64[ns]datetime.date进行比较 我只想根据day and month进行比较

去向

                  timestamp  last_price
0 2023-01-22 14:15:06.033314     100.0
1 2023-01-25 14:15:06.213591     101.0
2 2023-01-30 14:15:06.313554     102.0
3 2023-03-31 14:15:07.018540     103.0

cu_date = datetime.datetime.now().date()
cu_year = cu_date.year
check_end_date = datetime.datetime.strptime(f'{cu_year}-11-05', '%Y-%m-%d').date()
check_start_date = datetime.datetime.strptime(f'{cu_year}-03-12', '%Y-%m-%d').date()

# this is incorrect as the day can be greater than check_start_date while the month might be less. 
daylight_off_df = df.loc[((df.timestamp.dt.month >= check_end_date.month) & (df.timestamp.dt.day >= check_end_date.day)) |
                             ((df.timestamp.dt.month <= check_start_date.month) & (df.timestamp.dt.day <= check_start_date.day))]
    daylight_on_df = df.loc[((df.timestamp.dt.month <= check_end_date.month) & (df.timestamp.dt.day <= check_end_date.day)) &
                            ((df.timestamp.dt.month >= check_start_date.month) & (df.timestamp.dt.day >= check_start_date.day))]

我试图想出这样做的逻辑,但失败了。

预计 output:

daylight_off_df

                  timestamp  last_price
0 2023-01-22 14:15:06.033314     100.0
1 2023-01-25 14:15:06.213591     101.0
2 2023-01-30 14:15:06.313554     102.0

daylight_on_df

                   timestamp  last_price
3 2023-03-31 14:15:07.018540     103.0

总而言之,将 dataframe 按日和月比较分开,同时忽略年份。

我会分解这些值然后查询

df['day'] = df['timestamp'].dt.day_name()
df['month'] = df['timestamp'].dt.month_name()

然后无论你在找什么:

df.groupby('month').mean()

如果您不想在表中添加其他列,则以下参数可能会有所帮助:

check_end_date.timetuple().tm_yday # returns day of the year 
#output 309  
check_start_date.timetuple().tm_yday
#output 71
df['timestamp'].dt.is_leap_year.astype(int) #returns 1 if year is a leapyear
#output 0 | 1
df['timestamp'].dt.dayofyear #returns day of the year
#output 
#0    22
#1    25
#2    30
#3    90
df['timestamp'].dt.dayofyear.between(a,b) #returns true if day is between a,b

现在有一些可能的解决方案。 我认为使用 between 可能是最好看的。

daylight_on_df4 = df.loc[df['timestamp'].dt.dayofyear.between(
    check_start_date.timetuple().tm_yday + df['timestamp'].dt.is_leap_year.astype(int),
    check_end_date.timetuple().tm_yday + df['timestamp'].dt.is_leap_year.astype(int))]
daylight_off_df4 = df.loc[~df['timestamp'].dt.dayofyear.between(
    check_start_date.timetuple().tm_yday + df['timestamp'].dt.is_leap_year.astype(int),
    check_end_date.timetuple().tm_yday + df['timestamp'].dt.is_leap_year.astype(int))]

或者代码可能如下所示:

daylight_on_df3 = df.loc[((check_end_date.timetuple().tm_yday + df['timestamp'].dt.is_leap_year.astype(int)) - df['timestamp'].dt.dayofyear > 0) 
                         & (df['timestamp'].dt.dayofyear - (df['timestamp'].dt.is_leap_year.astype(int) + check_start_date.timetuple().tm_yday) > 0)]
daylight_off_df3 = df.loc[((check_end_date.timetuple().tm_yday + df['timestamp'].dt.is_leap_year.astype(int)) - df['timestamp'].dt.dayofyear < 0) 
                          | (df['timestamp'].dt.dayofyear - (check_start_date.timetuple().tm_yday + df['timestamp'].dt.is_leap_year.astype(int)) < 0)]

所有 daylight_on/off 现在正在做的是检查一年中的某一天是否在您的范围内(包括闰年)。 如果您的开始日期/结束日期跨越一年(例如 202 2 -11-19、202 3 -02-22),则可能需要重写此公式,但我认为它提供了一个总体思路。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM