简体   繁体   English

从日期列和用户在 Pandas 中指定的输入参数计算具体的 function

[英]Calculate specific function from the Date column and input parameter specified by user in Pandas

I have a df as shown below.我有一个 df ,如下所示。

Date                t_factor     
2020-02-01             5             
2020-02-03             23              
2020-02-06             14           
2020-02-09             23
2020-02-10             23  
2020-02-11             23          
2020-02-13             30            
2020-02-20             29            
2020-02-29             100
2020-03-01             38
2020-03-10             38               
2020-03-11             38                    
2020-03-26             70           
2020-03-29             70 

       

From that I would like to create a function that will calculate the column called t_function based on the calculated values t1, t2 and t3.由此我想创建一个 function,它将根据计算值 t1、t2 和 t3 计算名为 t_function 的列。

where user will enter following parameters.用户将在其中输入以下参数。

Step1:
Enter start_date1 = 2020-02-01
Enter end_date1 =  2020-02-06
Enter a0 = 3
Enter a1 = 1
Enter a2 = 0

calculate t1 as number of days from start_date1 (2020-02-01) to the values in date column till end_date1.
t_function = a0 + a1*t1 + a2*(t1)**2

Step2:
Enter start_date2 = 2020-02-13
Enter end_date2 =  2020-02-29
Enter a0 = 2
Enter a1 = 0
Enter a2 = 1
calculate time_in_days as t2, which is 1 on start_date2 = 2020-02-13 and so on till end_date2
t_function = a0 + a1*t2 + a2*(t2)**2


Step3:
Enter start_date3 = 2020-03-11
Enter end_date3 =  2020-03-29
Enter a0 = 4
Enter a1 = 0
Enter a2 = 0
calculate time_in_days as t3, which is 1 on start_date2 = 2020-02-13 and so on till end_date2
t_function = t_function = a0 + a1*t3 + a2*(t3)**2

Expected output:预期 output:

Date                t_factor     t1         t2         t3       t_function
2020-02-01             5          1         NaN        NaN      4
2020-02-03             23         3         NaN        NaN      6
2020-02-06             14         6         NaN        NaN      9
2020-02-09             23         NaN       NaN        NaN      NaN
2020-02-10             23         NaN       NaN        NaN      NaN
2020-02-11             23         NaN       NaN        NaN      NaN
2020-02-13             30         NaN        1         NaN      3   
2020-02-20             29         NaN        8         NaN      66
2020-02-29             100        NaN        17        NaN      291
2020-03-01             38         NaN       NaN        NaN      NaN
2020-03-10             38         NaN       NaN        NaN      NaN
2020-03-11             38         NaN       NaN        1        4 
2020-03-26             70         NaN       NaN        15       4
2020-03-29             70         NaN       NaN        18       4

Note: Initial start_date ie start_date1 should first date of Date column.注意:初始 start_date 即 start_date1 应该是 Date 列的第一个日期。 Final end_date is end_date3 should be final date of Date column.最终 end_date 是 end_date3 应该是 Date 列的最终日期。 The column t_factor is not used.不使用列 t_factor。

I tried below code to calculate t1 after that I am confused.之后我尝试了下面的代码来计算 t1 我很困惑。 Since I am new in python and pandas因为我是 python 和 pandas 的新手

df['t1'] = (df['Date'] - df.at[0, 'Date']).dt.days + 1

Here is how I will go about it:这是我将如何 go 关于它:

import pandas as pd
from io import StringIO
from datetime import datetime, timedelta
import numpy as np

df = pd.read_csv(StringIO("""Date                t_factor     
2020-02-01             5             
2020-02-03             23              
2020-02-06             14           
2020-02-09             23           
2020-02-13             30            
2020-02-20             29            
2020-02-29             100               
2020-03-11             38                    
2020-03-26             70           
2020-03-29             70 """), sep="\s+", parse_dates=[0])
df

def fun(x, start="2020-02-01", end="2020-02-06", a0=3, a1=1, a2=0):
    start = datetime.strptime(start, "%Y-%m-%d")
    end = datetime.strptime(end, "%Y-%m-%d")
    if start <= x.Date <= end:
        t2 = (x.Date - start)/np.timedelta64(1, 'D') + 1
        diff = a0 + a1*t2 + a2*(t2)**2
    else:
        diff = np.NaN
    return diff

df["t1"] = df.apply(lambda x: fun(x), axis=1)
df["t2"] = df.apply(lambda x: fun(x, "2020-02-13", "2020-02-29", 2, 0, 1), axis=1)
df["t3"] = df.apply(lambda x: fun(x, "2020-03-11", "2020-03-29", 4, 0, 0), axis=1)
df["t_function"] =  df["t1"].fillna(0) + df["t2"].fillna(0) + df["t3"].fillna(0)

df

Here is the output:这是 output:

 Date   t_factor    t1  t2    t3    t_function
0   2020-02-01  5   4.0 NaN   NaN   4.0
1   2020-02-03  23  6.0 NaN   NaN   6.0
2   2020-02-06  14  9.0 NaN   NaN   9.0
3   2020-02-09  23  NaN NaN   NaN   0.0
4   2020-02-13  30  NaN 3.0   NaN   3.0
5   2020-02-20  29  NaN 66.0  NaN   66.0
6   2020-02-29  100 NaN 291.0 NaN   291.0
7   2020-03-11  38  NaN NaN   4.0   4.0
8   2020-03-26  70  NaN NaN   4.0   4.0
9   2020-03-29  70  NaN NaN   4.0   4.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM