适用于整个数据帧的用户定义函数涉及pandas中的另一个数据帧

Question

I am trying to do some math between two data frames df1 and df2, but I find it difficult to use pd.apply function: 我想在两个数据帧df1和df2之间做一些数学运算，但我发现很难使用pd.apply函数：

df1: DF1：

   number1  number2  number3  … 
0   0         0        0      …
1   0         0.25     0      …
2   0.1       0.1      0      …
3   0         0        0.3    …
4   0         0        0      …

df2: DF2：

   number1  number2  number3    … 
0   2         3.3        6      …
1   2.1       3.4        6      …
2   2.2       3.2      5.8      …
3   2.1       3.4      6.2      …
4   2         4.0      6.4      …

I want to change each element in df1 by following rules: 我想按照以下规则更改df1中的每个元素：

change every 0 element in df1 to 1 将df1中的每个0元素更改为1
each non-zero element in df1, let df.iloc[m,n] = (1-df.iloc[m,n])/df2.shift(1).iloc[m,n], where m,n is the position of the non-zero element df1中的每个非零元素，让df.iloc [m，n] =（1-df.iloc [m，n]）/ df2.shift（1）.iloc [m，n]，其中m，n是非零元素的位置

Now I have a code which works: 现在我有一个有效的代码：

df1_new=pd.DataFrame(1,index = df1.index,columns = df1.columns)
df2_sft=df2.shift(1)
m,n=np.where(np.array(df1)!=0)
for i in m:
  for j in n:
    df1_new.iloc[i,j]=(1-df1.iloc[i,j])/df2_sft.iloc[i,j]

But as you can see, it is just ugly and incredibly slow if df1 and df2 are large. 但正如你所看到的，如果df1和df2很大，它只是丑陋而且非常慢。 I believe there must be many other ways to do this simple math very quickly, hope you can give some help. 我相信必须有很多其他方法可以很快地完成这个简单的数学运算，希望你能提供一些帮助。

Also, I am always confused by apply and applymap, what is the difference and when should use one over the other? 另外，我总是对apply和applymap感到困惑，有什么区别，什么时候应该使用另一个？

Answer 1

You want to vectorize you code, which is to say rather than use for loops do the calculation on the whole DataFrame/array, something like the following will be much much faster: 你想要对代码进行矢量化，也就是说，而不是使用for循环来对整个DataFrame /数组进行计算，类似下面的代码会快得多：

In [11]: ((1 - df1) / df2_sft).where(df1 != 0, 1)
Out[11]:
    number1   number2  number3
0  1.000000  1.000000  1.00000
1  1.000000  0.227273  1.00000
2  0.428571  0.264706  1.00000
3  1.000000  1.000000  0.12069
4  1.000000  1.000000  1.00000

Note: this doesn't match your code as you ( incorrectly ) don't iterate over just the non-zero elements (as you iterate over all items in m for each element in n, rather than the zipped items). 注意：这与您的代码不匹配，因为您（ 错误地 ）不仅仅迭代非零元素（因为您迭代m中的每个元素的所有项目，而不是压缩的项目）。

适用于整个数据帧的用户定义函数涉及pandas中的另一个数据帧

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-02-03 05:57:08

适用于整个数据帧的用户定义函数涉及pandas中的另一个数据帧

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-02-03 05:57:08

解决方案1
3 已采纳 2015-02-03 05:57:08