[英]apply to the entire dataframe a user-defined function involving another dataframe in pandas
I am trying to do some math between two data frames df1 and df2, but I find it difficult to use pd.apply function: 我想在两个数据帧df1和df2之间做一些数学运算,但我发现很难使用pd.apply函数:
df1: DF1:
number1 number2 number3 …
0 0 0 0 …
1 0 0.25 0 …
2 0.1 0.1 0 …
3 0 0 0.3 …
4 0 0 0 …
df2: DF2:
number1 number2 number3 …
0 2 3.3 6 …
1 2.1 3.4 6 …
2 2.2 3.2 5.8 …
3 2.1 3.4 6.2 …
4 2 4.0 6.4 …
I want to change each element in df1 by following rules: 我想按照以下规则更改df1中的每个元素:
Now I have a code which works: 现在我有一个有效的代码:
df1_new=pd.DataFrame(1,index = df1.index,columns = df1.columns)
df2_sft=df2.shift(1)
m,n=np.where(np.array(df1)!=0)
for i in m:
for j in n:
df1_new.iloc[i,j]=(1-df1.iloc[i,j])/df2_sft.iloc[i,j]
But as you can see, it is just ugly and incredibly slow if df1 and df2 are large. 但正如你所看到的,如果df1和df2很大,它只是丑陋而且非常慢。 I believe there must be many other ways to do this simple math very quickly, hope you can give some help.
我相信必须有很多其他方法可以很快地完成这个简单的数学运算,希望你能提供一些帮助。
Also, I am always confused by apply and applymap, what is the difference and when should use one over the other? 另外,我总是对apply和applymap感到困惑,有什么区别,什么时候应该使用另一个?
You want to vectorize you code, which is to say rather than use for loops do the calculation on the whole DataFrame/array, something like the following will be much much faster: 你想要对代码进行矢量化,也就是说,而不是使用for循环来对整个DataFrame /数组进行计算,类似下面的代码会快得多:
In [11]: ((1 - df1) / df2_sft).where(df1 != 0, 1)
Out[11]:
number1 number2 number3
0 1.000000 1.000000 1.00000
1 1.000000 0.227273 1.00000
2 0.428571 0.264706 1.00000
3 1.000000 1.000000 0.12069
4 1.000000 1.000000 1.00000
Note: this doesn't match your code as you ( incorrectly ) don't iterate over just the non-zero elements (as you iterate over all items in m for each element in n, rather than the zipped items). 注意:这与您的代码不匹配,因为您( 错误地 )不仅仅迭代非零元素(因为您迭代m中的每个元素的所有项目,而不是压缩的项目)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.