[英]Avoid iteration over rows for computation in pandas
County date available_wheat usage rate (%) consumption
A 1/2/2021 100.00 3
A 1/3/2021 3
A 1/4/2021 2
A 1/5/2021 5
A 1/6/2021 1
A 1/7/2021 2
A 1/8/2021 5
A 1/9/2021 6
A 1/10/2021 7
A 1/11/2021 8
A 1/12/2021 1
A 1/13/2021 2
上面是我的dataframe,需要填写可用的栏目。 可用需要减少使用率 (%),我可以使用 iterrows ( https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html )。
与显示的内容相比,我的 dataframe 相当大,所以问题是:是否可以使用 lambda 或其他方式对计算进行矢量化?
预计 output:
County date available_wheat usage rate (%) consumption
A 1/2/2021 100.00 3 3.00
A 1/3/2021 97.00 3 2.91
A 1/4/2021 94.09 2 1.88
A 1/5/2021 92.21 5 4.61
A 1/6/2021 87.60 1 0.88
A 1/7/2021 86.72 2 1.73
A 1/8/2021 84.99 5 4.25
A 1/9/2021 80.74 6 4.84
A 1/10/2021 75.89 7 5.31
A 1/11/2021 70.58 8 5.65
A 1/12/2021 64.93 1 0.65
A 1/13/2021 64.29 2 1.29
您需要使用使用率的移动cumprod
:
factor = df['usage rate (%)'].shift(fill_value=0).rsub(100).div(100).cumprod()
df['available_wheat'] = df['available_wheat'].ffill().mul(factor)
df['consumption'] = df['usage rate (%)'].mul(df['available_wheat']).div(100)
注意。 如果您有多个县并且想要独立处理它们,那么在groupby
中执行所有这些。 添加round(2)
以获得 2 个有效数字。
output:
County date available_wheat usage rate (%) consumption
0 A 1/2/2021 100.000000 3 3.000000
1 A 1/3/2021 97.000000 3 2.910000
2 A 1/4/2021 94.090000 2 1.881800
3 A 1/5/2021 92.208200 5 4.610410
4 A 1/6/2021 87.597790 1 0.875978
5 A 1/7/2021 86.721812 2 1.734436
6 A 1/8/2021 84.987376 5 4.249369
7 A 1/9/2021 80.738007 6 4.844280
8 A 1/10/2021 75.893727 7 5.312561
9 A 1/11/2021 70.581166 8 5.646493
10 A 1/12/2021 64.934673 1 0.649347
11 A 1/13/2021 64.285326 2 1.285707
groupby
中的相同逻辑:
factor = (df.groupby('County')['usage rate (%)']
.apply(lambda s: s.shift(fill_value=0).rsub(100).div(100).cumprod())
)
df['available_wheat'] = df.groupby('County')['available_wheat'].ffill().mul(factor)
df['consumption'] = df['usage rate (%)'].mul(df['available_wheat']).div(100)
available_wheat2=100
def function1(ss:pd.Series):
global available_wheat2
ss['available_wheat']=available_wheat2
ss.consumption=np.round(available_wheat2 * ss['usage rate (%)'] / 100, 2)
available_wheat2= available_wheat2 - ss['consumption']
return ss
df1.apply(function1,axis=1)
out:
County date available_wheat usage rate (%) consumption
0 A 1/2/2021 100.00 3 3.00
1 A 1/3/2021 97.00 3 2.91
2 A 1/4/2021 94.09 2 1.88
3 A 1/5/2021 92.21 5 4.61
4 A 1/6/2021 87.60 1 0.88
5 A 1/7/2021 86.72 2 1.73
6 A 1/8/2021 84.99 5 4.25
7 A 1/9/2021 80.74 6 4.84
8 A 1/10/2021 75.90 7 5.31
9 A 1/11/2021 70.59 8 5.65
10 A 1/12/2021 64.94 1 0.65
11 A 1/13/2021 64.29 2 1.29
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.