[英]Get last date in each month of a time series pandas
Currently I'm generating a DateTimeIndex using a certain function, zipline.utils.tradingcalendar.get_trading_days
.目前我正在使用某个函数zipline.utils.tradingcalendar.get_trading_days
生成 DateTimeIndex 。 The time series is roughly daily but with some gaps.时间序列大致是每天,但有一些差距。
My goal is to get the last date in the DateTimeIndex
for each month.我的目标是获取每个月DateTimeIndex
的最后一个日期。
.to_period('M')
& .to_timestamp('M')
don't work since they give the last day of the month rather than the last value of the variable in each month. .to_period('M')
& .to_timestamp('M')
不起作用,因为它们给出了当月的最后一天而不是每个月变量的最后一个值。
As an example, if this is my time series I would want to select '2015-05-29' while the last day of the month is '2015-05-31'.例如,如果这是我的时间序列,我想选择“2015-05-29”,而当月的最后一天是“2015-05-31”。
['2015-05-18', '2015-05-19', '2015-05-20', '2015-05-21', '2015-05-22', '2015-05-26', '2015-05-27', '2015-05-28', '2015-05-29', '2015-06-01'] ['2015-05-18'、'2015-05-19'、'2015-05-20'、'2015-05-21'、'2015-05-22'、'2015-05-26'、' 2015-05-27'、'2015-05-28'、'2015-05-29'、'2015-06-01']
Condla's answer came closest to what I needed except that since my time index stretched for more than a year I needed to groupby by both month and year and then select the maximum date. Condla 的回答最接近我的需要,除了因为我的时间索引延长了一年多,我需要按月份和年份分组,然后选择最大日期。 Below is the code I ended up with.下面是我最终得到的代码。
# tempTradeDays is the initial DatetimeIndex
dateRange = []
tempYear = None
dictYears = tempTradeDays.groupby(tempTradeDays.year)
for yr in dictYears.keys():
tempYear = pd.DatetimeIndex(dictYears[yr]).groupby(pd.DatetimeIndex(dictYears[yr]).month)
for m in tempYear.keys():
dateRange.append(max(tempYear[m]))
dateRange = pd.DatetimeIndex(dateRange).order()
My strategy would be to group by month and then select the "maximum" of each group:我的策略是按月分组,然后选择每个组的“最大值”:
If "dt" is your DatetimeIndex object:如果“dt”是您的 DatetimeIndex 对象:
last_dates_of_the_month = []
dt_month_group_dict = dt.groupby(dt.month)
for month in dt_month_group_dict:
last_date = max(dt_month_group_dict[month])
last_dates_of_the_month.append(last_date)
The list "last_date_of_the_month" contains all occuring last dates of each month in your dataset.列表“last_date_of_the_month”包含数据集中每个月所有出现的最后日期。 You can use this list to create a DatetimeIndex in pandas again (or whatever you want to do with it).您可以使用此列表再次在 Pandas 中创建 DatetimeIndex(或您想用它做的任何事情)。
This is an old question, but all existing answers here aren't perfect.这是一个老问题,但这里所有现有的答案都不完美。 This is the solution I came up with (assuming that date is a sorted index), which can be even written in one line, but I split it for readability:这是我想出的解决方案(假设日期是一个排序索引),它甚至可以写在一行中,但为了可读性我将其拆分:
month1 = pd.Series(apple.index.month)
month2 = pd.Series(apple.index.month).shift(-1)
mask = (month1 != month2)
apple[mask.values].head(10)
Few notes here:这里有一些注意事项:
pd.Series
instance (see here )移动日期时间序列需要另一个pd.Series
实例(请参阅此处).values
(see here )布尔掩码索引需要.values
(参见此处) By the way, when the dates are the business days , it'd be easier to use resampling: apple.resample('BM')
顺便说一句,当日期是工作日时,使用重采样会更容易: apple.resample('BM')
Suppose your data frame looks like this假设您的数据框如下所示
Then the following Code will give you the last day of each month.那么下面的代码会给你每个月的最后一天。
df_monthly = df.reset_index().groupby([df.index.year,df.index.month],as_index=False).last().set_index('index')
This one line code does its job :)这一行代码完成了它的工作:)
Maybe the answer is not needed anymore, but while searching for an answer to the same question I found maybe a simpler solution:也许不再需要答案,但是在寻找同一问题的答案时,我发现了一个更简单的解决方案:
import pandas as pd
sample_dates = pd.date_range(start='2010-01-01', periods=100, freq='B')
month_end_dates = sample_dates[sample_dates.is_month_end]
试试这个,创建一个新的差异列,其中值 1 指向从一个月到下一个月的变化。
df['diff'] = np.where(df['Date'].dt.month.diff() != 0,1,0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.