简体   繁体   中英

Converting float32 to float64 takes more than expected in numpy

I had a performance issue in a numpy project and then I realized that about 3 fourth of the execution time is wasted on a single line of code:

error = abs(detected_matrix[i, step] - original_matrix[j, new])

and when I have changed the line to

error = abs(original_matrix[j, new] - detected_matrix[i, step])

the problem has disappeared.

I relized that the type of original_matrix was float64 and type of detected_matrix was float32 . By changing types of either of these two varibles the problem solved.

I was wondering that if this is a well known issue?

Here is a sample code that represents the problem

from timeit import timeit
import numpy as np

f64 = np.array([1.0], dtype='float64')[0]
f32 = np.array([1.0], dtype='float32')[0]

timeit_result = timeit(stmt="abs(f32 - f64)", number=1000000, globals=globals())
print(timeit_result)


timeit_result = timeit(stmt="abs(f64 - f32)", number=1000000, globals=globals())
print(timeit_result)

Output in my computer:

2.8707289
0.15719420000000017

which is quite strange.

TL;DR: Please use Numpy >= 1.23.0.

This problem has been fixed in Numpy 1.23.0 (more specifically the version 1.23.0-rc1). This pull request rewrites the scalar math logic so to make it faster in many cases including in your specific use-case. With version 1.22.4, the former code is 10 times slower than the second one. This is also true for earlier versions like the 1.21.5. In the 1.23.0, the former is only 10%-15% slower but both takes a very small time: 140 ns/operation versus 122 ns/operation. The small difference is due to a slightly different path taken in the type-checking part of the code. For more information about this low-level behavior, please read this post . Note that iterating over Numpy items it not meant to be very fast, nor operating on Numpy scalar. If your code is limited by that, please consider converting Numpy scalar into Python ones as stated in the 1.23.0 release notes :

Many operations on NumPy scalars are now significantly faster, although rare operations (eg with 0-D arrays rather than scalars) may be slower in some cases. However, even with these improvements users who want the best performance for their scalars, may want to convert a known NumPy scalar into a Python one using scalar.item().

An even faster solution is to use Numba/Cython in this case or just to try to vectorize the encompassing loop if possible.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM