I am trying to do some math between two data frames df1 and df2, but I find it difficult to use pd.apply function:
df1:
number1 number2 number3 …
0 0 0 0 …
1 0 0.25 0 …
2 0.1 0.1 0 …
3 0 0 0.3 …
4 0 0 0 …
df2:
number1 number2 number3 …
0 2 3.3 6 …
1 2.1 3.4 6 …
2 2.2 3.2 5.8 …
3 2.1 3.4 6.2 …
4 2 4.0 6.4 …
I want to change each element in df1 by following rules:
Now I have a code which works:
df1_new=pd.DataFrame(1,index = df1.index,columns = df1.columns)
df2_sft=df2.shift(1)
m,n=np.where(np.array(df1)!=0)
for i in m:
for j in n:
df1_new.iloc[i,j]=(1-df1.iloc[i,j])/df2_sft.iloc[i,j]
But as you can see, it is just ugly and incredibly slow if df1 and df2 are large. I believe there must be many other ways to do this simple math very quickly, hope you can give some help.
Also, I am always confused by apply and applymap, what is the difference and when should use one over the other?
You want to vectorize you code, which is to say rather than use for loops do the calculation on the whole DataFrame/array, something like the following will be much much faster:
In [11]: ((1 - df1) / df2_sft).where(df1 != 0, 1)
Out[11]:
number1 number2 number3
0 1.000000 1.000000 1.00000
1 1.000000 0.227273 1.00000
2 0.428571 0.264706 1.00000
3 1.000000 1.000000 0.12069
4 1.000000 1.000000 1.00000
Note: this doesn't match your code as you ( incorrectly ) don't iterate over just the non-zero elements (as you iterate over all items in m for each element in n, rather than the zipped items).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.