![](/img/trans.png)
[英]Calling a specific Pandas Dataframe from user input to use in a function?
[英]Calculate specific function from the Date column and input parameter specified by user in Pandas
我有一个 df ,如下所示。
Date t_factor
2020-02-01 5
2020-02-03 23
2020-02-06 14
2020-02-09 23
2020-02-10 23
2020-02-11 23
2020-02-13 30
2020-02-20 29
2020-02-29 100
2020-03-01 38
2020-03-10 38
2020-03-11 38
2020-03-26 70
2020-03-29 70
由此我想创建一个 function,它将根据计算值 t1、t2 和 t3 计算名为 t_function 的列。
用户将在其中输入以下参数。
Step1:
Enter start_date1 = 2020-02-01
Enter end_date1 = 2020-02-06
Enter a0 = 3
Enter a1 = 1
Enter a2 = 0
calculate t1 as number of days from start_date1 (2020-02-01) to the values in date column till end_date1.
t_function = a0 + a1*t1 + a2*(t1)**2
Step2:
Enter start_date2 = 2020-02-13
Enter end_date2 = 2020-02-29
Enter a0 = 2
Enter a1 = 0
Enter a2 = 1
calculate time_in_days as t2, which is 1 on start_date2 = 2020-02-13 and so on till end_date2
t_function = a0 + a1*t2 + a2*(t2)**2
Step3:
Enter start_date3 = 2020-03-11
Enter end_date3 = 2020-03-29
Enter a0 = 4
Enter a1 = 0
Enter a2 = 0
calculate time_in_days as t3, which is 1 on start_date2 = 2020-02-13 and so on till end_date2
t_function = t_function = a0 + a1*t3 + a2*(t3)**2
预期 output:
Date t_factor t1 t2 t3 t_function
2020-02-01 5 1 NaN NaN 4
2020-02-03 23 3 NaN NaN 6
2020-02-06 14 6 NaN NaN 9
2020-02-09 23 NaN NaN NaN NaN
2020-02-10 23 NaN NaN NaN NaN
2020-02-11 23 NaN NaN NaN NaN
2020-02-13 30 NaN 1 NaN 3
2020-02-20 29 NaN 8 NaN 66
2020-02-29 100 NaN 17 NaN 291
2020-03-01 38 NaN NaN NaN NaN
2020-03-10 38 NaN NaN NaN NaN
2020-03-11 38 NaN NaN 1 4
2020-03-26 70 NaN NaN 15 4
2020-03-29 70 NaN NaN 18 4
注意:初始 start_date 即 start_date1 应该是 Date 列的第一个日期。 最终 end_date 是 end_date3 应该是 Date 列的最终日期。 不使用列 t_factor。
之后我尝试了下面的代码来计算 t1 我很困惑。 因为我是 python 和 pandas 的新手
df['t1'] = (df['Date'] - df.at[0, 'Date']).dt.days + 1
这是我将如何 go 关于它:
import pandas as pd
from io import StringIO
from datetime import datetime, timedelta
import numpy as np
df = pd.read_csv(StringIO("""Date t_factor
2020-02-01 5
2020-02-03 23
2020-02-06 14
2020-02-09 23
2020-02-13 30
2020-02-20 29
2020-02-29 100
2020-03-11 38
2020-03-26 70
2020-03-29 70 """), sep="\s+", parse_dates=[0])
df
def fun(x, start="2020-02-01", end="2020-02-06", a0=3, a1=1, a2=0):
start = datetime.strptime(start, "%Y-%m-%d")
end = datetime.strptime(end, "%Y-%m-%d")
if start <= x.Date <= end:
t2 = (x.Date - start)/np.timedelta64(1, 'D') + 1
diff = a0 + a1*t2 + a2*(t2)**2
else:
diff = np.NaN
return diff
df["t1"] = df.apply(lambda x: fun(x), axis=1)
df["t2"] = df.apply(lambda x: fun(x, "2020-02-13", "2020-02-29", 2, 0, 1), axis=1)
df["t3"] = df.apply(lambda x: fun(x, "2020-03-11", "2020-03-29", 4, 0, 0), axis=1)
df["t_function"] = df["t1"].fillna(0) + df["t2"].fillna(0) + df["t3"].fillna(0)
df
这是 output:
Date t_factor t1 t2 t3 t_function
0 2020-02-01 5 4.0 NaN NaN 4.0
1 2020-02-03 23 6.0 NaN NaN 6.0
2 2020-02-06 14 9.0 NaN NaN 9.0
3 2020-02-09 23 NaN NaN NaN 0.0
4 2020-02-13 30 NaN 3.0 NaN 3.0
5 2020-02-20 29 NaN 66.0 NaN 66.0
6 2020-02-29 100 NaN 291.0 NaN 291.0
7 2020-03-11 38 NaN NaN 4.0 4.0
8 2020-03-26 70 NaN NaN 4.0 4.0
9 2020-03-29 70 NaN NaN 4.0 4.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.