[英]Resample Pandas time series at custom interval and get interval number within a year
我有一個與此類似的數據框,但它擴展了數十年的數據:
df = pd.DataFrame({'time':['2003-02-02', '2003-02-03', '2003-02-04', '2003-02-05', '2003-02-06', '2003-02-07', '2003-02-08', '2003-02-09','2003-02-10', '2003-02-11'], 'NDVI': [0.505413, 0.504566, 0.503682, 0.502759, 0.501796, 0.500791, 0.499743, 0.498651, 0.497514, 0.496332]})
df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')
df.set_index('time', inplace=True)
輸出:
NDVI time 2003-02-02 0.505413 2003-02-03 0.504566 2003-02-04 0.503682 2003-02-05 0.502759 2003-02-06 0.501796 2003-02-07 0.500791 2003-02-08 0.499743 2003-02-09 0.498651 2003-02-10 0.497514 2003-02-11 0.496332
我想要:
NDVI
值。 如果間隔是例如 10 天,值將被分類為 [Jan-1 : Jan-10]、[Jan-11 : Jan-20] 等。一年的最后一個間隔必須是 5 或 6 -天間隔取決於閏年(即一年中的第 360-365/6 天)。 NDVI yr_interval time 2003-01-31 0.505413 4 2003-02-10 0.497514 5
在上面的示例中,第一行表示 2003 年的第 4 個 10 天間隔。
如何實施,知道:
pandas.Series.dt.week
行為)?試試pandas.Series.dt.dayofyear
並將結果除以你想要的間隔怎么樣? 如果您使用 7 作為間隔,這將等同於pandas.Series.dt.week
。
證明留給讀者作為練習。
這應該有效。 我不知道是否有任何其他有效的方法可以做到這一點,但這是它的樣子。 在這里,我按年份對日期進行分組,然后按 10 天的時間間隔對每個年份組進行分組,並獲取年份組中每個 10 天組的索引 - 這樣每個年份組的索引都會重置
示例數據框 - 'time' col 具有從 01/01/2005 到 12/31/2007 的日期值,'NDVI' col 具有一些隨機值。
np.random.seed(123)
df = pd.DataFrame({'time': pd.date_range('2005-01-01', '2007-12-31', freq='1D'),
'NDVI': np.random.randn(1095)})
df['time'] = pd.to_datetime(df['time'], format='%Y-%m-%d')
df.set_index('time', inplace=True)
函數assign_group_interval
將每個組(作為df)按10 天的時間間隔分組,並從分組的df 的索引中獲取yr_interval
值。 此功能將應用於每個年份組,以便為每個年份組重置標識符。
def assign_group_interval(year_grp):
year_grp['period'] = year_grp.groupby(pd.Grouper(freq='10D'), as_index=False)\
.apply(lambda x: x['NDVI'])\
.index.get_level_values(0)
return year_grp
按年份對df
進行分組並應用函數assign_group_interval
print(df.groupby(df.index.year).apply(assign_group_interval).reset_index())
輸出:
time NDVI period
0 2005-01-01 -1.085631 0
1 2005-01-02 0.997345 0
2 2005-01-03 0.282978 0
3 2005-01-04 -1.506295 0
4 2005-01-05 -0.578600 0
5 2005-01-06 1.651437 0
6 2005-01-07 -2.426679 0
7 2005-01-08 -0.428913 0
8 2005-01-09 1.265936 0
9 2005-01-10 -0.866740 0
10 2005-01-11 -0.678886 1
11 2005-01-12 -0.094709 1
12 2005-01-13 1.491390 1
13 2005-01-14 -0.638902 1
14 2005-01-15 -0.443982 1
15 2005-01-16 -0.434351 1
16 2005-01-17 2.205930 1
17 2005-01-18 2.186786 1
18 2005-01-19 1.004054 1
19 2005-01-20 0.386186 1
20 2005-01-21 0.737369 2
21 2005-01-22 1.490732 2
22 2005-01-23 -0.935834 2
23 2005-01-24 1.175829 2
24 2005-01-25 -1.253881 2
25 2005-01-26 -0.637752 2
...
...
455 2006-04-01 0.104061 9
456 2006-04-02 0.165957 9
457 2006-04-03 1.601908 9
458 2006-04-04 0.058687 9
459 2006-04-05 1.064423 9
460 2006-04-06 -0.039329 9
461 2006-04-07 1.448904 9
462 2006-04-08 -1.870397 9
463 2006-04-09 -0.598732 9
464 2006-04-10 0.983033 9
465 2006-04-11 -0.171596 10
466 2006-04-12 0.931530 10
467 2006-04-13 0.385066 10
468 2006-04-14 0.945877 10
469 2006-04-15 0.613068 10
470 2006-04-16 0.673649 10
471 2006-04-17 1.492455 10
472 2006-04-18 0.986474 10
473 2006-04-19 0.993807 10
474 2006-04-20 0.020419 10
475 2006-04-21 -0.581850 11
476 2006-04-22 -0.659560 11
477 2006-04-23 0.750945 11
478 2006-04-24 -2.438461 11
479 2006-04-25 -1.307178 11
480 2006-04-26 -0.963254 11
...
...
830 2007-04-11 -0.482365 10
831 2007-04-12 1.079796 10
832 2007-04-13 -0.421079 10
833 2007-04-14 -1.166471 10
834 2007-04-15 0.856555 10
835 2007-04-16 -0.017391 10
836 2007-04-17 1.448577 10
837 2007-04-18 0.892200 10
838 2007-04-19 -0.229427 10
839 2007-04-20 -0.449668 10
840 2007-04-21 0.023372 11
841 2007-04-22 0.190210 11
842 2007-04-23 -0.881749 11
843 2007-04-24 0.841940 11
844 2007-04-25 -0.397363 11
845 2007-04-26 -0.423028 11
846 2007-04-27 -0.540688 11
847 2007-04-28 0.231017 11
848 2007-04-29 -0.692053 11
849 2007-04-30 0.134970 11
850 2007-05-01 2.766603 12
851 2007-05-02 -0.053609 12
852 2007-05-03 -0.434005 12
853 2007-05-04 -1.667689 12
854 2007-05-05 0.050222 12
855 2007-05-06 -1.109231 12
...
...
1070 2007-12-07 0.923065 34
1071 2007-12-08 -0.822000 34
1072 2007-12-09 1.607085 34
1073 2007-12-10 0.737825 34
1074 2007-12-11 -0.403760 34
1075 2007-12-12 -2.114548 34
1076 2007-12-13 -0.000311 34
1077 2007-12-14 -1.181809 34
1078 2007-12-15 0.299635 34
1079 2007-12-16 1.451169 34
1080 2007-12-17 0.160060 35
1081 2007-12-18 -0.178013 35
1082 2007-12-19 0.342205 35
1083 2007-12-20 0.285650 35
1084 2007-12-21 -2.362864 35
1085 2007-12-22 0.240937 35
1086 2007-12-23 0.620277 35
1087 2007-12-24 -0.259342 35
1088 2007-12-25 0.978559 35
1089 2007-12-26 -0.127675 35
1090 2007-12-27 0.766999 36
1091 2007-12-28 2.273105 36
1092 2007-12-29 -0.096391 36
1093 2007-12-30 -1.942132 36
1094 2007-12-31 -0.336592 36
@cwalvoort 的解決方案更好
df['yr_interval'] = df.index.dayofyear // 10
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.