简体   繁体   English

获取时间序列熊猫每个月的最后一个日期

[英]Get last date in each month of a time series pandas

Currently I'm generating a DateTimeIndex using a certain function, zipline.utils.tradingcalendar.get_trading_days .目前我正在使用某个函数zipline.utils.tradingcalendar.get_trading_days生成 DateTimeIndex 。 The time series is roughly daily but with some gaps.时间序列大致是每天,但有一些差距。

My goal is to get the last date in the DateTimeIndex for each month.我的目标是获取每个月DateTimeIndex的最后一个日期。

.to_period('M') & .to_timestamp('M') don't work since they give the last day of the month rather than the last value of the variable in each month. .to_period('M') & .to_timestamp('M')不起作用,因为它们给出了当月的最后一天而不是每个月变量的最后一个值。

As an example, if this is my time series I would want to select '2015-05-29' while the last day of the month is '2015-05-31'.例如,如果这是我的时间序列,我想选择“2015-05-29”,而当月的最后一天是“2015-05-31”。

['2015-05-18', '2015-05-19', '2015-05-20', '2015-05-21', '2015-05-22', '2015-05-26', '2015-05-27', '2015-05-28', '2015-05-29', '2015-06-01'] ['2015-05-18'、'2015-05-19'、'2015-05-20'、'2015-05-21'、'2015-05-22'、'2015-05-26'、' 2015-05-27'、'2015-05-28'、'2015-05-29'、'2015-06-01']

Condla's answer came closest to what I needed except that since my time index stretched for more than a year I needed to groupby by both month and year and then select the maximum date. Condla 的回答最接近我的需要,除了因为我的时间索引延长了一年多,我需要按月份和年份分组,然后选择最大日期。 Below is the code I ended up with.下面是我最终得到的代码。

# tempTradeDays is the initial DatetimeIndex
dateRange = []  
tempYear = None  
dictYears = tempTradeDays.groupby(tempTradeDays.year)
for yr in dictYears.keys():
    tempYear = pd.DatetimeIndex(dictYears[yr]).groupby(pd.DatetimeIndex(dictYears[yr]).month)
    for m in tempYear.keys():
        dateRange.append(max(tempYear[m]))
dateRange = pd.DatetimeIndex(dateRange).order()

My strategy would be to group by month and then select the "maximum" of each group:我的策略是按月分组,然后选择每个组的“最大值”:

If "dt" is your DatetimeIndex object:如果“dt”是您的 DatetimeIndex 对象:

last_dates_of_the_month = []
dt_month_group_dict = dt.groupby(dt.month)
for month in dt_month_group_dict:
    last_date = max(dt_month_group_dict[month])
    last_dates_of_the_month.append(last_date)

The list "last_date_of_the_month" contains all occuring last dates of each month in your dataset.列表“last_date_of_the_month”包含数据集中每个月所有出现的最后日期。 You can use this list to create a DatetimeIndex in pandas again (or whatever you want to do with it).您可以使用此列表再次在 Pandas 中创建 DatetimeIndex(或您想用它做的任何事情)。

This is an old question, but all existing answers here aren't perfect.这是一个老问题,但这里所有现有的答案都不完美。 This is the solution I came up with (assuming that date is a sorted index), which can be even written in one line, but I split it for readability:这是我想出的解决方案(假设日期是一个排序索引),它甚至可以写在一行中,但为了可读性我将其拆分:

month1 = pd.Series(apple.index.month)
month2 = pd.Series(apple.index.month).shift(-1)
mask = (month1 != month2)
apple[mask.values].head(10)

Few notes here:这里有一些注意事项:

  • Shifting a datetime series requires another pd.Series instance (see here )移动日期时间序列需要另一个pd.Series实例(请参阅此处
  • Boolean mask indexing requires .values (see here )布尔掩码索引需要.values (参见此处

By the way, when the dates are the business days , it'd be easier to use resampling: apple.resample('BM')顺便说一句,当日期是工作日时,使用重采样会更容易: apple.resample('BM')

Suppose your data frame looks like this假设您的数据框如下所示

original dataframe原始数据框

Then the following Code will give you the last day of each month.那么下面的代码会给你每个月的最后一天。

df_monthly = df.reset_index().groupby([df.index.year,df.index.month],as_index=False).last().set_index('index')

transformed_dataframe转换数据帧

This one line code does its job :)这一行代码完成了它的工作:)

Maybe the answer is not needed anymore, but while searching for an answer to the same question I found maybe a simpler solution:也许不再需要答案,但是在寻找同一问题的答案时,我发现了一个更简单的解决方案:

import pandas as pd 

sample_dates = pd.date_range(start='2010-01-01', periods=100, freq='B')
month_end_dates = sample_dates[sample_dates.is_month_end]

试试这个,创建一个新的差异列,其中值 1 指向从一个月到下一个月的变化。

     df['diff'] = np.where(df['Date'].dt.month.diff() != 0,1,0) 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 熊猫-从日期列表中获取每个月的最后一个日期 - Pandas - From list of dates, get the last date in each month 返回 pandas 中每个月的最后日期和值 - return last date and value each month in pandas Pandas系列:每次求一个月内的值之和 - Pandas series: find the sum of values in a period of one month each time 在Pandas时间序列中,如何在延迟到期前为每一行获取最后一个值? - How do I get last value before a delay expires, for each row, in Pandas time series? Python Pandas 数据框:对于一年中的每个月,如果月份不存在,则将当月最后一天的日期添加到索引中,或者删除重复项 - Python Pandas dataframe: For each month of the year, add the date with last day in the month to an index if month not present, or remove duplicates 如何在熊猫时间序列中获取一个月的所有行,而不管年份如何? - How to get all rows of a month irrespective of year in a pandas time series? 获取pandas中每个月的最后一个非NaN值 - Get last non-NaN value for each month in pandas 如何在 pandas 中的数字系列中获取每一行的最后一位数字 - How to get the last digit in each row in a number series in pandas 如何操作 pandas 日期时间对象以获取上个月的最后日期 - How to manipulate pandas datetime objects to get the last date of previous month 在Python的日期列表中获取每个月的最后日期 - Get the last date of each month in a list of dates in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM