简体   繁体   English

覆盖变量名是否应该带来性能优势?

[英]Should over-writing variable names give any performance benefits?

When parsing objects out of large datasets, I often find myself aggregating information into set objects, then transforming them into lists to sort them. 从大型数据集中解析对象时,我经常发现自己将信息聚合到集合对象中,然后将其转换为列表以对其进行排序。

For example, here might be a code snippet: 例如,这可能是一个代码片段:

all_times = set([])

for row in dataset:
   time = parse_out_time(row)
   all_times.add(time)

sorted_times = sorted(list(all_times))

My question is about that last assignment. 我的问题是关于最后的任务。 I could reassign the same variable name with the sorted list 我可以用排序列表重新分配相同的变量名

all_times = sorted(list(all_times))

I know that python has automatic garbage collection to remove the data assigned to old variable names that are not reused. 我知道python具有自动垃圾回收功能,可删除分配给未重用的旧变量名称的数据。 This approach seems like it would allow the Python interpreter to immediately de-allocate the memory belonging to the old set version of all_times. 这种方法似乎将允许Python解释器立即取消分配属于all_times的旧设置版本的内存。 If we ran the above code in a loop over a few million datasets, this could be important. 如果我们在数百万个数据集中循环运行上述代码,这可能很重要。

Should writing over variable names that you will never use again give any performance benefits? 覆盖不再使用的变量名是否应该带来任何性能上的好处? Or is Python's garbage collector smart enough to immediately de-allocate the memory for variables that are never called again by the script? 还是Python的垃圾回收器足够聪明,可以立即为脚本不再调用的变量重新分配内存?

Python doesn't do any static analysis of the code. Python不会对代码进行任何静态分析。 It will maintain the reference count for an object bound to a variable until that variable goes out of scope (for instance, on return ), the variable is reassigned ( all_times = sorted(list(all_times) ) or is deleted ( del all_times ). In the case of a set , you could also do all_times.clear() to get rid of the contained data. All four are reasonable ways to get rid of a container that is no longer needed. 它将保持绑定到变量的对象的引用计数,直到该变量超出范围(例如,在return ),重新分配变量( all_times = sorted(list(all_times) )或删除该变量( del all_times )。在set的情况下,您也可以执行all_times.clear()摆脱包含的数据,所有这四种都是摆脱不再需要的容器的合理方法。

Notice in any case that the contained data is still in sorted_times . 请注意,无论如何,所包含的数据仍在sorted_times All you got rid of is the hash tables used by the set. 您摆脱的只是集合使用的哈希表。 Its likely not that big of a help either way. 无论哪种方式,它的帮助都不大。

Il don't think it would cost so much resources if you do it right (depending on your needs). Il认为,如果操作正确(取决于您的需求),它不会花费那么多资源。

I mean, If you run this code into a loop, you will still only use 2 variables. 我的意思是,如果将此代码运行到一个循环中,您仍将仅使用2个变量。

Things will be slightly different if you put this code into a function, and run the function into a loop. 如果将此代码放入函数中,然后将函数运行到循环中,则情况会稍有不同。 And it would consume a little more if you run this function as multithreaded tasks. 如果您将此功能作为多线程任务运行,它将消耗更多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM