繁体   English   中英

编写一个通用 function 来计算基于列的特定条件,该条件基于 pandas 中的另一个列值

[英]Write a generic function to calculate a column based specific condition based another column value in pandas

我有一个 df ,如下所示。

Date                t_factor     time_in_days
2020-02-01             5             1
2020-02-06             14            6
2020-02-09             23            9
2020-02-03             23            3       
2020-03-11             38            40         
2020-02-20             29            20               
2020-02-13             30            13           
2020-02-29             100           29           
2020-03-26             70            55 

由此我想创建一个 function,它将根据 time_in_days 中的值计算名为 t_function 的列。

t_function 的条件:

if T0 <= time_in_days <T1:
      t_function = (a1*time_in_days) + a0
else if T1 < time_in_days <= T2:
    t_function = 14
else:
    t_function = a2(time_in_days)**2 + (a1*time_in_days) + a0 

函数输入将是数据帧、time_in_days、T0、T1、T2、a0、a1、a2)

预期 output:

if T0 =1 , T1=4, T2= 35, a0=3, a1=2, a2=1


Date                t_factor     time_in_days     t_function
2020-02-01             5             1            5
2020-02-06             14            6            14
2020-02-09             23            9            14
2020-02-03             23            3            9 
2020-03-11             38            40           1683           
2020-02-20             29            20           14     
2020-02-13             30            13           14  
2020-02-29             100           29           14
2020-03-26             70            55           3138

我试过下面的代码

def t_function_df( df, time_in_days, T0, T1, T2, a0, a1, a2):
     df = df.copy()
     df['t_function'] = np.select( (df['time_in_days'].ge(T0) & df['time_in_days'].le(T1),
                             df['time_in_days'].gt(T1) & df['time_in_days'].le(T2)),
                            (df['time_in_days']*a1+a0, 14),   
                            (a2*df['time_in_days']**2) + df['time_in_days']*a1 + a0)

     return df[['Date', 'time_in_days', 't_function']]

使用自定义 function fxnp.select

def fx(days, T0, T1, T2, a0, a1, a2):
    return np.select([days.ge(T0) & days.lt(T1), days.gt(T1) & days.le(T2)],
                     [a1*days + a0, 14], a2*(days)**2 + (a1*days) + a0)

df['t_function'] = fx(df['time_in_days'], T0 =1, T1=4, T2= 35, a0=3, a1=2, a2=1)

结果:

         Date  t_factor  time_in_days  t_function
0  2020-02-01         5             1           5
1  2020-02-06        14             6          14
2  2020-02-09        23             9          14
3  2020-02-03        23             3           9
4  2020-03-11        38            40        1683
5  2020-02-20        29            20          14
6  2020-02-13        30            13          14
7  2020-02-29       100            29          14
8  2020-03-26        70            55        3138

np.select是显而易见的解决方案,但既然你在这里有伪代码,为什么不直接运行它!

if T0 <= time_in_days < T1:
      t_function = (a1 * time_in_days) + a0
else if T1 < time_in_days <= T2:
    t_function = 14
else:
    t_function = (a2 * time_in_days) ** 2 + (a1 * time_in_days) + a0 

现在会变成,

T0 = 1; T1 = 4; T2 = 35; a0 = 3; a1 = 2; a2 = 1
condlist = ["@T0 <= time_in_days < @T1", "@T1 < time_in_days <= @T2"]
choicelist = ["(@a1 * time_in_days) + @a0", "14"]
default = "(@a2 * time_in_days) ** 2 + (@a1 * time_in_days) + @a0"

“@”符号用于在 memory 中查找实际变量。 您现在可以像这样组合 numpy 和 pandas 的功能:

np.select(condlist=[df.eval(c) for c in condlist], 
          choicelist=[df.eval(q) for q in choicelist], 
          default=df.eval(default))  
# array([   5,   14,   14,    9, 1683,   14,   14,   14, 3138], dtype=int64)

df['t_function_actual'] = np.select(
              condlist=[df.eval(c) for c in condlist], 
              choicelist=[df.eval(q) for q in choicelist], 
              default=df.eval(default))  
df     
         Date  t_factor  time_in_days  t_function  t_function_actual
0  2020-02-01         5             1           5                  5
1  2020-02-06        14             6          14                 14
2  2020-02-09        23             9          14                 14
3  2020-02-03        23             3           9                  9
4  2020-03-11        38            40        1683               1683
5  2020-02-20        29            20          14                 14
6  2020-02-13        30            13          14                 14
7  2020-02-29       100            29          14                 14
8  2020-03-26        70            55        3138               3138

在我的帖子中阅读有关eval的更多信息: 使用 pd.eval() 在 pandas 中进行动态表达式评估

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM