简体   繁体   中英

Global vs local variable efficiency in multiple function call

First caveat I understand that premature optimization is always bad. Second caveat I'm fairly new to python.

I'm reading in many million data chunks. Each chunk consists of 64 bits and is held in a numpy array. In order to do bit operations on a numpy.uint64 type the desired bit shift quantity must also be of the same type:numpy.uint64.

This can be accomplished by either casting the number or making a variable.

number1 = numpy.uint64(80000)
shift_amount = numpy.uint64(8)
#option 1
number1 >> numpy.uint64(8)
#option2
number1 >> shift_amount

looping 10000 times and checking how long it took. Option2 always wins out I'm assuming because the overhead of creating a numpy integer is done only once.

My current program calls a function for each chunk of data a processes the raw bits. This function is called millions of times and appends a few different lists. Assuming the same idea and just using globally defined values for the shift/bit operations two more loop conditions were tested.

def using_global(number1):
    global shift_amount
    number1 >> shift_amount

def using_local(number1):
    shift = np.uint64(54)
    number1 >> shift 

looping these 10000 times the function using global was always an order of magnitude faster. Question: Is it bad practice to have a bunch(10+) global variables? https://wiki.python.org/moin/PythonSpeed/PerformanceTips states that local variable will be faster. In this instance I found that not to be the case. My main loop simply calls the function for each word in the million of data words so that's probably inefficient too.

Python is not made for massive number operations. Numpy instead is. Put all data junks in one numpy array. This is way faster than using loops and single function calls:

values = numpy.ndarray(1000000, dtype=numpy.uint64)
# fill in the data
values >>= 8

If you're shift depends on the highest nibble, for example, the nibble-values from 0 to 15 have a lookup table for the shift:

shift_by_nibble = numpy.array([8,16,24,30,34,60,50,40,44,48,52,56,62,4,12,20], dtype=numpy.uint8)
values >>= shift_by_nibble[values>>60]

using_global references a global variable. using_local references a local variable, but it also includes a call to np.uint64() , which affects performance.

Unless I'm misunderstanding the issue, another valid option would be to pass the variable shift_amount into the function as:

def my_func(number1, shift_amount):
    return number1 >> shift_amount

and then calling the function as new_number = my_func(old_number, shift_amount)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM