简体   繁体   English

多功能调用中的全局变量与局部变量效率

[英]Global vs local variable efficiency in multiple function call

First caveat I understand that premature optimization is always bad. 首先要说明的是,我知道过早的优化总是不好的。 Second caveat I'm fairly new to python. 第二点警告,我对python还是相当陌生。

I'm reading in many million data chunks. 我正在读取数百万个数据块。 Each chunk consists of 64 bits and is held in a numpy array. 每个块由64位组成,并保存在numpy数组中。 In order to do bit operations on a numpy.uint64 type the desired bit shift quantity must also be of the same type:numpy.uint64. 为了对numpy.uint64类型执行位运算,所需的移位量也必须是相同的类型:numpy.uint64。

This can be accomplished by either casting the number or making a variable. 这可以通过强制转换数字或设置变量来实现。

number1 = numpy.uint64(80000)
shift_amount = numpy.uint64(8)
#option 1
number1 >> numpy.uint64(8)
#option2
number1 >> shift_amount

looping 10000 times and checking how long it took. 循环10000次并检查耗时。 Option2 always wins out I'm assuming because the overhead of creating a numpy integer is done only once. 我假设Option2总是胜出,因为创建numpy整数的开销仅完成一次。

My current program calls a function for each chunk of data a processes the raw bits. 我当前的程序为每个处理原始位的数据块调用一个函数。 This function is called millions of times and appends a few different lists. 此函数被称为数百万次,并附加了一些不同的列表。 Assuming the same idea and just using globally defined values for the shift/bit operations two more loop conditions were tested. 假设相同的想法,并且仅对移位/位操作使用全局定义的值,则还要测试另外两个循环条件。

def using_global(number1):
    global shift_amount
    number1 >> shift_amount

def using_local(number1):
    shift = np.uint64(54)
    number1 >> shift 

looping these 10000 times the function using global was always an order of magnitude faster. 使用global循环这些函数10000次总是快一个数量级。 Question: Is it bad practice to have a bunch(10+) global variables? 问题:拥有一堆(超过10个)全局变量是否是错误的做法? https://wiki.python.org/moin/PythonSpeed/PerformanceTips states that local variable will be faster. https://wiki.python.org/moin/PythonSpeed/PerformanceTips指出局部变量会更快。 In this instance I found that not to be the case. 在这种情况下,我发现情况并非如此。 My main loop simply calls the function for each word in the million of data words so that's probably inefficient too. 我的主循环只是为数百万个数据字中的每个字调用函数,因此这可能效率也很低。

Python is not made for massive number operations. Python并非用于大量运算。 Numpy instead is. 脾气暴躁的是。 Put all data junks in one numpy array. 将所有数据垃圾放入一个numpy数组中。 This is way faster than using loops and single function calls: 这比使用循环和单个函数调用要快得多:

values = numpy.ndarray(1000000, dtype=numpy.uint64)
# fill in the data
values >>= 8

If you're shift depends on the highest nibble, for example, the nibble-values from 0 to 15 have a lookup table for the shift: 例如,如果您进行移位取决于最高的半字节,则从0到15的半字节值具有用于移位的查找表:

shift_by_nibble = numpy.array([8,16,24,30,34,60,50,40,44,48,52,56,62,4,12,20], dtype=numpy.uint8)
values >>= shift_by_nibble[values>>60]

using_global references a global variable. using_global引用一个全局变量。 using_local references a local variable, but it also includes a call to np.uint64() , which affects performance. using_local引用一个局部变量,但它还包含对np.uint64()的调用,这会影响性能。

Unless I'm misunderstanding the issue, another valid option would be to pass the variable shift_amount into the function as: 除非我对这个问题有误解,否则另一个有效的选择是将变量shift_amount传递给以下函数:

def my_func(number1, shift_amount):
    return number1 >> shift_amount

and then calling the function as new_number = my_func(old_number, shift_amount) 然后将函数调用为new_number = my_func(old_number, shift_amount)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM