[英]Resample Pandas time series at custom interval and get interval number within a year
I have a data frame similar to this, except that it extends over decades of data:我有一个与此类似的数据框,但它扩展了数十年的数据:
df = pd.DataFrame({'time':['2003-02-02', '2003-02-03', '2003-02-04', '2003-02-05', '2003-02-06', '2003-02-07', '2003-02-08', '2003-02-09','2003-02-10', '2003-02-11'], 'NDVI': [0.505413, 0.504566, 0.503682, 0.502759, 0.501796, 0.500791, 0.499743, 0.498651, 0.497514, 0.496332]})
df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')
df.set_index('time', inplace=True)
Output:
NDVI time 2003-02-02 0.505413 2003-02-03 0.504566 2003-02-04 0.503682 2003-02-05 0.502759 2003-02-06 0.501796 2003-02-07 0.500791 2003-02-08 0.499743 2003-02-09 0.498651 2003-02-10 0.497514 2003-02-11 0.496332
I would like to:我想要:
NDVI
value at a custom time interval that starts from the beginning of every year.NDVI
值。 If the interval is eg 10 days, values will be binned as [Jan-1 : Jan-10], [Jan-11 : Jan-20] etc. The last interval of the year will have to be either a 5- or 6-day interval depending on being a leap year (ie 360th-365/6th day of the year). NDVI yr_interval time 2003-01-31 0.505413 4 2003-02-10 0.497514 5
In the above example, the first line represents the 4th 10-day interval of year 2003.在上面的示例中,第一行表示 2003 年的第 4 个 10 天间隔。
How to implement that, knowing that:如何实施,知道:
pandas.Series.dt.week
)?pandas.Series.dt.week
行为)?How about trying pandas.Series.dt.dayofyear
and divide that result by the interval you want?试试
pandas.Series.dt.dayofyear
并将结果除以你想要的间隔怎么样? This would be equivalent to pandas.Series.dt.week
if you used 7 as your interval.如果您使用 7 作为间隔,这将等同于
pandas.Series.dt.week
。
The proof is left as an exercise for the reader.证明留给读者作为练习。
This should work.这应该有效。 I don't know if there is any other efficient way to do this, but here is what it looks like.
我不知道是否有任何其他有效的方法可以做到这一点,但这是它的样子。 Here I have grouped the the dates by year and then group each year group by the time interval of 10 days and get the index of each 10 day group within the year group - so that the index resets for every year group
在这里,我按年份对日期进行分组,然后按 10 天的时间间隔对每个年份组进行分组,并获取年份组中每个 10 天组的索引 - 这样每个年份组的索引都会重置
Sample data frame - 'time' col has date values from 01/01/2005 to 12/31/2007 and 'NDVI' col has some random values.示例数据框 - 'time' col 具有从 01/01/2005 到 12/31/2007 的日期值,'NDVI' col 具有一些随机值。
np.random.seed(123)
df = pd.DataFrame({'time': pd.date_range('2005-01-01', '2007-12-31', freq='1D'),
'NDVI': np.random.randn(1095)})
df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')
df.set_index('time', inplace=True)
The function assign_group_interval
takes each group (as df) and groups them by the time interval of 10 days and gets the yr_interval
value from the index of the grouped df.函数
assign_group_interval
将每个组(作为df)按10 天的时间间隔分组,并从分组的df 的索引中获取yr_interval
值。 This function will be applied to each year group so that the identifier resets for each year group.此功能将应用于每个年份组,以便为每个年份组重置标识符。
def assign_group_interval(year_grp):
year_grp['period'] = year_grp.groupby(pd.Grouper(freq='10D'), as_index=False)\
.apply(lambda x: x['NDVI'])\
.index.get_level_values(0)
return year_grp
Group the df
by year and apply the function assign_group_interval
按年份对
df
进行分组并应用函数assign_group_interval
print(df.groupby(df.index.year).apply(assign_group_interval).reset_index())
Output:输出:
time NDVI period
0 2005-01-01 -1.085631 0
1 2005-01-02 0.997345 0
2 2005-01-03 0.282978 0
3 2005-01-04 -1.506295 0
4 2005-01-05 -0.578600 0
5 2005-01-06 1.651437 0
6 2005-01-07 -2.426679 0
7 2005-01-08 -0.428913 0
8 2005-01-09 1.265936 0
9 2005-01-10 -0.866740 0
10 2005-01-11 -0.678886 1
11 2005-01-12 -0.094709 1
12 2005-01-13 1.491390 1
13 2005-01-14 -0.638902 1
14 2005-01-15 -0.443982 1
15 2005-01-16 -0.434351 1
16 2005-01-17 2.205930 1
17 2005-01-18 2.186786 1
18 2005-01-19 1.004054 1
19 2005-01-20 0.386186 1
20 2005-01-21 0.737369 2
21 2005-01-22 1.490732 2
22 2005-01-23 -0.935834 2
23 2005-01-24 1.175829 2
24 2005-01-25 -1.253881 2
25 2005-01-26 -0.637752 2
...
...
455 2006-04-01 0.104061 9
456 2006-04-02 0.165957 9
457 2006-04-03 1.601908 9
458 2006-04-04 0.058687 9
459 2006-04-05 1.064423 9
460 2006-04-06 -0.039329 9
461 2006-04-07 1.448904 9
462 2006-04-08 -1.870397 9
463 2006-04-09 -0.598732 9
464 2006-04-10 0.983033 9
465 2006-04-11 -0.171596 10
466 2006-04-12 0.931530 10
467 2006-04-13 0.385066 10
468 2006-04-14 0.945877 10
469 2006-04-15 0.613068 10
470 2006-04-16 0.673649 10
471 2006-04-17 1.492455 10
472 2006-04-18 0.986474 10
473 2006-04-19 0.993807 10
474 2006-04-20 0.020419 10
475 2006-04-21 -0.581850 11
476 2006-04-22 -0.659560 11
477 2006-04-23 0.750945 11
478 2006-04-24 -2.438461 11
479 2006-04-25 -1.307178 11
480 2006-04-26 -0.963254 11
...
...
830 2007-04-11 -0.482365 10
831 2007-04-12 1.079796 10
832 2007-04-13 -0.421079 10
833 2007-04-14 -1.166471 10
834 2007-04-15 0.856555 10
835 2007-04-16 -0.017391 10
836 2007-04-17 1.448577 10
837 2007-04-18 0.892200 10
838 2007-04-19 -0.229427 10
839 2007-04-20 -0.449668 10
840 2007-04-21 0.023372 11
841 2007-04-22 0.190210 11
842 2007-04-23 -0.881749 11
843 2007-04-24 0.841940 11
844 2007-04-25 -0.397363 11
845 2007-04-26 -0.423028 11
846 2007-04-27 -0.540688 11
847 2007-04-28 0.231017 11
848 2007-04-29 -0.692053 11
849 2007-04-30 0.134970 11
850 2007-05-01 2.766603 12
851 2007-05-02 -0.053609 12
852 2007-05-03 -0.434005 12
853 2007-05-04 -1.667689 12
854 2007-05-05 0.050222 12
855 2007-05-06 -1.109231 12
...
...
1070 2007-12-07 0.923065 34
1071 2007-12-08 -0.822000 34
1072 2007-12-09 1.607085 34
1073 2007-12-10 0.737825 34
1074 2007-12-11 -0.403760 34
1075 2007-12-12 -2.114548 34
1076 2007-12-13 -0.000311 34
1077 2007-12-14 -1.181809 34
1078 2007-12-15 0.299635 34
1079 2007-12-16 1.451169 34
1080 2007-12-17 0.160060 35
1081 2007-12-18 -0.178013 35
1082 2007-12-19 0.342205 35
1083 2007-12-20 0.285650 35
1084 2007-12-21 -2.362864 35
1085 2007-12-22 0.240937 35
1086 2007-12-23 0.620277 35
1087 2007-12-24 -0.259342 35
1088 2007-12-25 0.978559 35
1089 2007-12-26 -0.127675 35
1090 2007-12-27 0.766999 36
1091 2007-12-28 2.273105 36
1092 2007-12-29 -0.096391 36
1093 2007-12-30 -1.942132 36
1094 2007-12-31 -0.336592 36
@cwalvoort's solution is better @cwalvoort 的解决方案更好
df['yr_interval'] = df.index.dayofyear // 10
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.