[英]Add Months to Data Frame using a period column
I'm looking to add a %Y%m%d date column to my dataframe using a period column that has integers 1-32, which represent monthly data points starting at a defined environment variable "odate" (eg if odate=20190531 then period 1 should be 20190531, period 2 should be 20190630, etc.) 我希望使用具有整数1-32的period列将%Y%m%d date列添加到我的数据框中,该列表示从定义的环境变量“ odate”开始的每月数据点(例如,如果odate = 20190531则期间1应该是20190531,期间2应该是20190630,依此类推)
I tried defining a dictionary with the number of periods in the column as the keys and the value being odate + MonthEnd(period -1) 我尝试定义一个字典,该字典以列中的句点数作为键,值是odate + MonthEnd(period -1)
This works fine and well; 效果很好。 however, I want to improve the code to be flexible given changes in the number of periods. 但是,鉴于周期数的变化,我想提高代码的灵活性。
Is there a function that will allow me to fill the date columns with the odate in period 1 and then subsequent month ends for subsequent periods? 是否有一个函数可以让我在时段1中用odate填充日期列,然后在随后的时段中以下一个月结束?
example dataset: 示例数据集:
odate=20190531 odate = 20190531
period value
1 5.5
2 5
4 6.2
3 5
5 40
11 5
desired dataset: 所需的数据集:
odate=20190531 odate = 20190531
period value date
1 5.5 2019-05-31
2 5 2019-06-30
4 6.2 2019-08-31
3 5 2019-07-31
5 40 2019-09-30
11 5 2020-03-31
You can use pd.date_range()
: 您可以使用pd.date_range()
:
pd.date_range(start = '2019-05-31', periods = 100,freq='M')
You can change total periods depending on what you need, the freq='M'
means a Month-End frequency 您可以根据需要更改总期限, freq='M'
表示月末频率
Here is a list of Offset Aliases you can for freq
parameter. 这是您可以为freq
参数设置的偏移别名的列表。
If you just want to add or subtract some period to a date you can use pd.DataOffset
: 如果您只想在日期上加上或减去一些句点,可以使用pd.DataOffset
:
odate = pd.Timestamp('20191031')
odate
>> Timestamp('2019-10-31 00:00:00')
odate - pd.DateOffset(months=4)
>> Timestamp('2019-06-30 00:00:00')
odate + pd.DateOffset(months=4)
>> Timestamp('2020-02-29 00:00:00')
To add given the period column to Month Ends: 要将给定的期间列添加到月末:
odate = pd.Timestamp('20190531')
df['date'] = df.period.apply(lambda x: odate + pd.offsets.MonthEnd(x-1))
df
period value date
0 1 5.5 2019-05-31
1 2 5.0 2019-06-30
2 4 6.2 2019-08-31
3 3 5.0 2019-07-31
4 5 40.0 2019-09-30
5 11 5.0 2020-03-31
To improve performance use list-comprehension
: 要提高性能,请使用list-comprehension
:
df['date'] = [odate + pd.offsets.MonthEnd(period-1) for period in df.period]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.