简体   繁体   中英

apply to the entire dataframe a user-defined function involving another dataframe in pandas

I am trying to do some math between two data frames df1 and df2, but I find it difficult to use pd.apply function:

df1:

   number1  number2  number3  … 
0   0         0        0      …
1   0         0.25     0      …
2   0.1       0.1      0      …
3   0         0        0.3    …
4   0         0        0      … 

df2:

   number1  number2  number3    … 
0   2         3.3        6      …
1   2.1       3.4        6      …
2   2.2       3.2      5.8      …
3   2.1       3.4      6.2      …
4   2         4.0      6.4      … 

I want to change each element in df1 by following rules:

  1. change every 0 element in df1 to 1
  2. each non-zero element in df1, let df.iloc[m,n] = (1-df.iloc[m,n])/df2.shift(1).iloc[m,n], where m,n is the position of the non-zero element

Now I have a code which works:

df1_new=pd.DataFrame(1,index = df1.index,columns = df1.columns)
df2_sft=df2.shift(1)
m,n=np.where(np.array(df1)!=0)
for i in m:
  for j in n:
    df1_new.iloc[i,j]=(1-df1.iloc[i,j])/df2_sft.iloc[i,j]

But as you can see, it is just ugly and incredibly slow if df1 and df2 are large. I believe there must be many other ways to do this simple math very quickly, hope you can give some help.

Also, I am always confused by apply and applymap, what is the difference and when should use one over the other?

You want to vectorize you code, which is to say rather than use for loops do the calculation on the whole DataFrame/array, something like the following will be much much faster:

In [11]: ((1 - df1) / df2_sft).where(df1 != 0, 1)
Out[11]:
    number1   number2  number3
0  1.000000  1.000000  1.00000
1  1.000000  0.227273  1.00000
2  0.428571  0.264706  1.00000
3  1.000000  1.000000  0.12069
4  1.000000  1.000000  1.00000

Note: this doesn't match your code as you ( incorrectly ) don't iterate over just the non-zero elements (as you iterate over all items in m for each element in n, rather than the zipped items).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM