简体   繁体   English

取 N 最后天的最小值

[英]Taking the min value of N last days

I have this data frame:我有这个数据框:

ID      Date  X  123_Var  456_Var  789_Var
 A  16-07-19  3      777      250      810
 A  17-07-19  9      637      121      529
 A  20-07-19  2      295      272      490
 A  21-07-19  3      778      600      544
 A  22-07-19  6      741      792      907
 A  25-07-19  6      435      416      820
 A  26-07-19  8      590      455      342
 A  27-07-19  6      763      476      753
 A  02-08-19  6      717      211      454
 A  03-08-19  6      152      442      475
 A  05-08-19  6      564      340      302
 A  07-08-19  6      105      929      633
 A  08-08-19  6      948      366      586
 B  07-08-19  4      509      690      406
 B  08-08-19  2      413      725      414
 B  12-08-19  2      170      702      912
 B  13-08-19  3      851      616      477
 B  14-08-19  9      475      447      555
 B  15-08-19  1      412      403      708
 B  17-08-19  2      299      537      321
 B  18-08-19  4      310      119      125

I want to show the min value of n last days (say, n = 4 ), using Date column, excluding the value of current day .我想显示n最后天的min (例如, n = 4 ),使用Date列,不包括 current day 的值

A similar solution has provided by jezrael . jezrael提供了类似的解决方案。 (That one calculates the mean , and not min .) (那个计算的mean ,而不是min 。)

Expected result:预期结果:

ID      Date  X  123_Var  456_Var  789_Var  123_Var_4  456_Var_4  789_Var_4
 A  16-07-19  3      777      250      810        NaN        NaN        NaN
 A  17-07-19  9      637      121      529      777.0      250.0      810.0
 A  20-07-19  2      295      272      490      637.0      121.0      529.0
 A  21-07-19  3      778      600      544      295.0      121.0      490.0
 A  22-07-19  6      741      792      907      295.0      272.0      490.0
 A  25-07-19  6      435      416      820      741.0      600.0      544.0
 A  26-07-19  8      590      455      342      435.0      416.0      820.0
 A  27-07-19  6      763      476      753      435.0      416.0      342.0
 A  02-08-19  6      717      211      454        NaN        NaN        NaN
 A  03-08-19  6      152      442      475      717.0      211.0      454.0
 A  05-08-19  6      564      340      302      152.0      211.0      454.0
 A  07-08-19  6      105      929      633      152.0      340.0      302.0
 A  08-08-19  6      948      366      586      105.0      340.0      302.0
 B  07-08-19  4      509      690      406        NaN        NaN        NaN
 B  08-08-19  2      413      725      414      509.0      690.0      406.0
 B  12-08-19  2      170      702      912      413.0      725.0      414.0
 B  13-08-19  3      851      616      477      170.0      702.0      414.0
 B  14-08-19  9      475      447      555      170.0      616.0      477.0
 B  15-08-19  1      412      403      708      170.0      447.0      477.0
 B  17-08-19  2      299      537      321      412.0      403.0      477.0
 B  18-08-19  4      310      119      125      299.0      403.0      321.0

Use similar solution like @Chris with custom lambda function in GroupBy.apply and last join to original by DataFrame.join :使用类似GroupBy.apply和 GroupBy.apply 中的自定义 lambda 函数的类似解决方案,最后通过DataFrame.join加入原始DataFrame.join

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

n = 4
cols = df.filter(regex='Var').columns
f = lambda x: x.asfreq('d').rolling(window=f'{n+1}D',closed="left")[cols].min()

df2 = (df.set_index('Date')
         .groupby('ID').apply(f)
         .add_suffix(f'_{n}'))
df = df.join(df2, on=['ID','Date'])

print (df)
   ID       Date  X  123_Var  456_Var  789_Var  123_Var_4  456_Var_4  \
0   A 2019-07-16  3      777      250      810        NaN        NaN   
1   A 2019-07-17  9      637      121      529      777.0      250.0   
2   A 2019-07-20  2      295      272      490      637.0      121.0   
3   A 2019-07-21  3      778      600      544      295.0      121.0   
4   A 2019-07-22  6      741      792      907      295.0      121.0   
5   A 2019-07-25  6      435      416      820      295.0      272.0   
6   A 2019-07-26  8      590      455      342      435.0      416.0   
7   A 2019-07-27  6      763      476      753      435.0      416.0   
8   A 2019-08-02  6      717      211      454        NaN        NaN   
9   A 2019-08-03  6      152      442      475      717.0      211.0   
10  A 2019-08-05  6      564      340      302      152.0      211.0   
11  A 2019-08-07  6      105      929      633      152.0      211.0   
12  A 2019-08-08  6      948      366      586      105.0      340.0   
13  B 2019-08-07  4      509      690      406        NaN        NaN   
14  B 2019-08-08  2      413      725      414      509.0      690.0   
15  B 2019-08-12  2      170      702      912      413.0      690.0   
16  B 2019-08-13  3      851      616      477      170.0      702.0   
17  B 2019-08-14  9      475      447      555      170.0      616.0   
18  B 2019-08-15  1      412      403      708      170.0      447.0   
19  B 2019-08-17  2      299      537      321      170.0      403.0   
20  B 2019-08-18  4      310      119      125      299.0      403.0   

    789_Var_4  
0         NaN  
1       810.0  
2       529.0  
3       490.0  
4       490.0  
5       490.0  
6       544.0  
7       342.0  
8         NaN  
9       454.0  
10      454.0  
11      302.0  
12      302.0  
13        NaN  
14      406.0  
15      406.0  
16      414.0  
17      477.0  
18      477.0  
19      477.0  
20      321.0  

One way using groupby and rolling :使用groupbyrolling一种方法:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
dfs = []
for k, d in df.set_index('Date').groupby('ID'):
    tmp = d.asfreq('1 D').filter(like='_Var').rolling('4D',closed="left").min().add_suffix("_4")
    dfs.append(d.merge(tmp, left_index=True, right_index=True))
new_df = pd.concat(dfs).reset_index()
print(new_df)

Output:输出:

         Date ID  X  123_Var  456_Var  789_Var  123_Var_4  456_Var_4  789_Var_4
0  2019-07-16  A  3      777      250      810        NaN        NaN        NaN
1  2019-07-17  A  9      637      121      529      777.0      250.0      810.0
2  2019-07-20  A  2      295      272      490      637.0      121.0      529.0
3  2019-07-21  A  3      778      600      544      295.0      121.0      490.0
4  2019-07-22  A  6      741      792      907      295.0      272.0      490.0
5  2019-07-25  A  6      435      416      820      741.0      600.0      544.0
6  2019-07-26  A  8      590      455      342      435.0      416.0      820.0
7  2019-07-27  A  6      763      476      753      435.0      416.0      342.0
8  2019-08-02  A  6      717      211      454        NaN        NaN        NaN
9  2019-08-03  A  6      152      442      475      717.0      211.0      454.0
10 2019-08-05  A  6      564      340      302      152.0      211.0      454.0
11 2019-08-07  A  6      105      929      633      152.0      340.0      302.0
12 2019-08-08  A  6      948      366      586      105.0      340.0      302.0
13 2019-08-07  B  4      509      690      406        NaN        NaN        NaN
14 2019-08-08  B  2      413      725      414      509.0      690.0      406.0
15 2019-08-12  B  2      170      702      912      413.0      725.0      414.0
16 2019-08-13  B  3      851      616      477      170.0      702.0      912.0
17 2019-08-14  B  9      475      447      555      170.0      616.0      477.0
18 2019-08-15  B  1      412      403      708      170.0      447.0      477.0
19 2019-08-17  B  2      299      537      321      412.0      403.0      477.0
20 2019-08-18  B  4      310      119      125      299.0      403.0      321.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM