[英]Pandas/Python Modeling Time-Series, Groups with Different Inputs
I am trying to model different scenarios for groups of assets in future years. 我正在尝试为未来几年的资产组建模不同的方案。 This is something I have accomplished very tediously in Excel, but want to leverage the large database I have built with Pandas.
这是我在Excel中非常繁琐的工作,但是我想利用我用Pandas建立的大型数据库。
Example: 例:
annual_group_cost = 0.02
df1:
year group x_count y_count value
2018 a 2 5 109000
2019 a 0 4 nan
2020 a 3 0 nan
2018 b 0 0 55000
2019 b 1 0 nan
2020 b 1 0 nan
2018 c 5 1 500000
2019 c 3 0 nan
2020 c 2 5 nan
df2:
group x_benefit y_cost individual_avg starting_value
a 0.2 0.72 1000 109000
b 0.15 0.75 20000 55000
c 0.15 0.70 20000 500000
I would like to update the values in df1, by taking the previous year's value (or starting value) and adding the x benefit, y cost, and annual cost. 我想更新df1中的值,方法是取上一年的值(或起始值),然后加上x收益,y成本和年度成本。 I am assuming this will take a function to accomplish, but I don't know of an efficient way to handle it.
我假设这需要一个功能来完成,但是我不知道一种有效的方法来处理它。
The final output I would like to have is: 我想要的最终输出是:
df1:
year group x_count y_count value
2018 a 2 5 103620
2019 a 0 4 98667.3
2020 a 3 0 97294.248
2018 b 0 0 53900
2019 b 1 0 56822
2020 b 1 0 59685.56
2018 c 5 1 495000
2019 c 3 0 497100
2020 c 2 5 420158
I achieved this by using: 我通过使用以下方法实现了这一点:
starting_value-(starting_value*annual_group_cost)+(x_count*(individual_avg*x_benefit))-(y_count*(individual_avg*y_cost))
Since subsequent new values are dependent upon previously calculated new values, this will need to involve (even if behind the scenes using eg apply
) a for loop: 由于后续的新值取决于先前计算的新值,因此这将需要涉及一个for循环(即使在幕后使用例如
apply
):
for i in range(1, len(df1)):
if np.isnan(df1.loc[i, 'value']):
df1.loc[i, 'value'] = df1.loc[i-1, 'value'] #your logic here
You should merge the two tables together and then just do the functions on the data Series 您应该将两个表合并在一起,然后对数据系列执行功能
hold = df_1.merge(df_2, on=['group']).fillna(0)
x = (hold.x_count*(hold.individual_avg*hold.x_benefit))
y = (hold.y_count*(hold.individual_avg*hold.y_cost))
for year in hold.year.unique():
start = hold.loc[hold.year == year, 'starting_value']
hold.loc[hold.year == year, 'value'] = (start-(start*annual_group_cost)+x-y)
if year != hold.year.max():
hold.loc[hold.year == year + 1, 'starting_value'] = hold.loc[hold.year == year, 'value'].values
hold.drop(['x_benefit', 'y_cost', 'individual_avg', 'starting_value'],axis=1)
Will give you 会给你
year group x_count y_count value
0 2018 a 2 5 103620.0
1 2019 a 0 4 98667.6
2 2020 a 3 0 97294.25
3 2018 b 0 0 53900.0
4 2019 b 1 0 55822.0
5 2020 b 1 0 57705.56
6 2018 c 5 1 491000.0
7 2019 c 3 0 490180.0
8 2020 c 2 5 416376.4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.