简体   繁体   English

适用于整个数据帧的用户定义函数涉及pandas中的另一个数据帧

[英]apply to the entire dataframe a user-defined function involving another dataframe in pandas

I am trying to do some math between two data frames df1 and df2, but I find it difficult to use pd.apply function: 我想在两个数据帧df1和df2之间做一些数学运算,但我发现很难使用pd.apply函数:

df1: DF1:

   number1  number2  number3  … 
0   0         0        0      …
1   0         0.25     0      …
2   0.1       0.1      0      …
3   0         0        0.3    …
4   0         0        0      … 

df2: DF2:

   number1  number2  number3    … 
0   2         3.3        6      …
1   2.1       3.4        6      …
2   2.2       3.2      5.8      …
3   2.1       3.4      6.2      …
4   2         4.0      6.4      … 

I want to change each element in df1 by following rules: 我想按照以下规则更改df1中的每个元素:

  1. change every 0 element in df1 to 1 将df1中的每个0元素更改为1
  2. each non-zero element in df1, let df.iloc[m,n] = (1-df.iloc[m,n])/df2.shift(1).iloc[m,n], where m,n is the position of the non-zero element df1中的每个非零元素,让df.iloc [m,n] =(1-df.iloc [m,n])/ df2.shift(1).iloc [m,n],其中m,n是非零元素的位置

Now I have a code which works: 现在我有一个有效的代码:

df1_new=pd.DataFrame(1,index = df1.index,columns = df1.columns)
df2_sft=df2.shift(1)
m,n=np.where(np.array(df1)!=0)
for i in m:
  for j in n:
    df1_new.iloc[i,j]=(1-df1.iloc[i,j])/df2_sft.iloc[i,j]

But as you can see, it is just ugly and incredibly slow if df1 and df2 are large. 但正如你所看到的,如果df1和df2很大,它只是丑陋而且非常慢。 I believe there must be many other ways to do this simple math very quickly, hope you can give some help. 我相信必须有很多其他方法可以很快地完成这个简单的数学运算,希望你能提供一些帮助。

Also, I am always confused by apply and applymap, what is the difference and when should use one over the other? 另外,我总是对apply和applymap感到困惑,有什么区别,什么时候应该使用另一个?

You want to vectorize you code, which is to say rather than use for loops do the calculation on the whole DataFrame/array, something like the following will be much much faster: 你想要对代码进行矢量化,也就是说,而不是使用for循环来对整个DataFrame /数组进行计算,类似下面的代码会快得多:

In [11]: ((1 - df1) / df2_sft).where(df1 != 0, 1)
Out[11]:
    number1   number2  number3
0  1.000000  1.000000  1.00000
1  1.000000  0.227273  1.00000
2  0.428571  0.264706  1.00000
3  1.000000  1.000000  0.12069
4  1.000000  1.000000  1.00000

Note: this doesn't match your code as you ( incorrectly ) don't iterate over just the non-zero elements (as you iterate over all items in m for each element in n, rather than the zipped items). 注意:这与您的代码不匹配,因为您( 错误地 )不仅仅迭代非零元素(因为您迭代m中的每个元素的所有项目,而不是压缩的项目)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 "如何将用户定义的函数应用于熊猫数据框中的列?" - How to apply a user-defined function to a column in pandas dataframe? 当我使用pandas DataFrame groupby然后应用用户定义的函数时,调用了多少个函数? - How many function calls when I use pandas DataFrame groupby and then apply user-defined function? 向量化熊猫数据框将函数应用于python中的用户定义函数 - Vectorising pandas dataframe apply function for user defined function in python Dask DataFrame:将自定义 function 应用于整个 Column,涉及 min()、max() - Dask DataFrame: apply custom function to the entire Column, involving min(), max() pandas 将用户定义的 function 应用于多列上的分组 dataframe - pandas apply User defined function to grouped dataframe on multiple columns 将具有多个参数的函数应用于Pandas中的整个数据框 - Apply a function with multiple arguments on an entire dataframe in Pandas 熊猫数据框将功能应用于整个列 - Pandas dataframe apply function to entire column 熊猫滚动应用功能到整个窗口数据框 - Pandas rolling apply function to entire window dataframe 如何使用 apply(基于用户定义的函数)向数据框添加新行和多行? - How to add new and multiple rows to a dataframe using apply (based on a user-defined function)? pandas 数据框上的用户定义函数 - User defined function on pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM