简体   繁体   English

PC遍历数组中的10 ^ 8个元素时无响应

[英]Pc going unresponsive when looping over a 10^8 elements in an array

When I execute python code which loops over an array of size 10^8. 当我执行python代码时,它将遍历大小为10 ^ 8的数组。 The pc becomes unresponsive and takes around 10 minutes to execute the code. 电脑无响应,大约需要10分钟才能执行代码。 After it is done with the script, it stays laggy for a while. 在完成脚本后,它会保持一段时间的延迟。

So is that a problem due to a weak processor or is it a Ram problem, and is the only way to fix it, is to upgrade the ram? 那么,这是由于处理器性能低下引起的问题还是Ram问题,并且解决该问题的唯一方法就是升级ram?

The script could be as simple as: 该脚本可能很简单:

arr = [x for x in range(pow(10,8))]
for i in range( len(arr) ):
   arr[i]+=1

My Specs are: RAM is 8 GB. 我的规格是:RAM是8 GB。 OS is Ubuntu. 操作系统是Ubuntu。 Python 3.6. Python 3.6。 Processor: Intel Core i7-3632QM 2.20GHZ 处理器:Intel Core i7-3632QM 2.20GHZ

More details on what exactly happens: when I run the script I can see the memory usage for the python process keeps getting up. 有关发生了什么的更多详细信息:运行脚本时,我可以看到python进程的内存使用量不断增加。 Then the PC becomes unresponsive. 然后,PC无法响应。 It doesn't respond to any action I give. 它不响应我执行的任何操作。 If anything was playing in the background it stops, If i move the mouse, the cursor won't move. 如果后台正在播放任何内容,它将停止,如果我移动鼠标,光标将不会移动。 Until the script is actually done. 直到脚本实际完成。 Then it becomes responsive but very laggy for a while. 然后它变得反应灵敏,但一段时间后显得很迟钝。 if I try to switch the active application to another minimized application it takes quiet some time. 如果我尝试将活动应用程序切换到另一个最小化的应用程序,则需要花费一些时间。 As if the PC was just booted up. 好像PC刚启动一样。 It takes a bit of time for everything to get back to normal. 恢复所有正常状态需要一些时间。

What's happening here is very likely paging / swapping 1 . 这里发生的事情很可能是分页 / 交换 1 Due to virtual address space, each process on your system can address a huge amount of memory - way more than you have physically in your computer. 由于虚拟地址空间的原因,系统上的每个进程都可以寻址大量内存-比计算机中的物理内存还要多。 If all processes together use more memory than you have physically available, the operating system is in trouble - one approach is paging : Moving data from some processes from memory to disk. 如果所有进程一起使用的内存多于物理上可用的内存,则操作系统将遇到麻烦-一种方法是分页 :将某些进程中的数据从内存移动到磁盘。

Since your disk, even if it's an SSD, is several orders of magnitude slower than RAM, the systems gets unresponsive. 由于您的磁盘(即使是SSD磁盘)也比RAM慢几个数量级,因此系统无法响应。 Say for example the OS decides to move the block of memory which contains your mouse cursor position onto the disk. 例如,操作系统决定将包含鼠标光标位置的内存块移到磁盘上。 Every time it updates the cursor, this introduces a huge delay. 每次更新游标时,都会引入巨大的延迟。 Even after the process which consumed all the memory finishes, it will take some time to load back all data from disk to RAM. 即使消耗完所有内存的过程完成后,也需要花费一些时间将所有数据从磁盘加载回RAM。

To illustrate, on my system with a comparable processor (i5-3320M), your example code finishes in a mere 20 seconds without impact on overall system responsiveness - that is because I have 16 GiB RAM. 为了说明这一点,在具有类似处理器(i5-3320M)的系统上,您的示例代码仅需20秒即可完成,而不会影响整个系统的响应能力-这是因为我有16 GiB RAM。 So clearly it is not about the "CPU [being saturated] with billions of operations" . 显然,这与“数十亿次操作的CPU [饱和]” 无关 Given you have a quad-core processor, and that code uses only one thread, you have lots of spare compute cycles. 如果您有一个四核处理器,并且该代码仅使用一个线程,那么您将拥有许多备用计算周期。 Even if you were to use up all the CPU cycles, the system is usually quite responsive, because the OS scheduler does a good job balancing CPU cycles between your compute task and the process moving your mouse cursor. 即使您要用完所有CPU周期,系统通常也会响应良好,因为OS调度程序可以很好地平衡计算任务和移动鼠标光标的过程之间的CPU周期。

Python is particularly prone to this issue, because it uses way more memory than necessary. Python特别容易出现此问题,因为它使用的内存多于必需的内存。 Python 3.6.1 on my system uses ~4 GiB for the data in arr - even though 10^8 64 bit integers would only use 800 MB. 我系统上的Python 3.6.1使用〜4 GiB来存储arr的数据-即使10 ^ 8 64位整数只能使用800 MB。 That's just due to the fact that everything in python is an object. 这仅仅是由于python中的所有对象都是对象。 You can be more memory-efficient if you don't permanently store anything in memory in the first place or use numpy . 如果您不首先将任何内容永久存储在内存中或不使用numpy则可以提高内存使用效率。 But to discuss that, would require a more problem-oriented code example. 但是要进行讨论,将需要一个更加面向问题的代码示例。

1: There are differences between paging and swapping, but nowadays it is used interchangeably. 1:分页和交换之间有区别 ,但如今可互换使用。

The short answer is, your application becomes unresponsive because you've totally saturated your CPU with billions of operations being performed on a large dataset. 简短的答案是,您的应用程序变得无响应,因为您已经对大型数据集执行了数十亿次操作,使您的CPU完全饱和。 While your program is stuck in those loops it can't do anything else and appears to lock up. 当您的程序陷入这些循环中时,它无法执行其他任何操作,并且似乎已锁定。

First, you're creating 100 million items using range() . 首先,您要使用range()创建1亿个项目。 This operation alone isn't going to be very fast because that's a lot of items. 仅此一项操作就不会很快,因为有很多项目。

Next, you're looping over those 100 million items with a list comprehension and building an entirely new list. 接下来,您将以列表理解方式遍历这1亿个项目,并构建一个全新的列表。 The comprehension seems pointless since you're just passing the value from range right through, but perhaps you're just simplifying it for the example. 理解似乎没有意义,因为您只是将范围内的值传递给了正确的对象,但是对于示例来说,您可能只是对其进行了简化。

Finally, you're using a for loop to once again loop over all those items in the newly generated list from the comprehension. 最后,您将使用for循环再次遍历新生成的列表中的所有那些项。

That's 3 loops and 3 lists; 那是3个循环和3个列表; One for the range() , another for the list comprehension and the third is the for loop. 一个用于range() ,另一个用于列表理解,第三个是for循环。 You're doing a lot of work creating huge lists multiple times. 您正在做很多工作,要多次创建庞大的列表。

The process of appending items to a list takes several operations and at 100 million items, that's 100Mhz * number-of-operations. 将项目追加到列表中的过程需要执行多个操作,并且每处理1亿个项目,即100Mhz *操作数。 For example, if it took 10 operations you're looking at about 1Ghz worth of processing time. 例如,如果执行了10次操作,则需要大约1Ghz的处理时间。 These aren't real benchmarks, but they illustrate how doing a lot of little operations like this can quickly add up to a lot of CPU time. 这些不是真正的基准测试,但它们说明了如何进行大量此类小操作可以迅速增加大量CPU时间。 Not only that but copying at-minimum 100MB of data around in memory several times is going to take additional time as well. 不仅如此,而且要在内存中至少复制100MB的数据多次,这也将花费更多的时间。 All of this leads to a lack of responsiveness because your CPU is totally saturated. 所有这些都会导致缺乏响应能力,因为您的CPU已完全饱和。

If you absolutely need to pre-build such a huge list, then make sure you only loop over it once and do all the work you need to do on that item at that time. 如果您绝对需要预先构建如此庞大的清单,请确保只遍历该清单一次,并在那时完成该项目需要做的所有工作。 That'll cut down on the number of times you recreate the list and save on memory since fewer lists will need to be stored in memory at the same time. 这将减少您重新创建列表并保存在内存中的次数,因为需要同时在内存中存储更少的列表。

If all you really need is an incrementing number, you can use a generator to count up. 如果您真正需要的只是递增数字,则可以使用生成器来递增计数。 Generators are far more efficient since they are "lazy"; 生成器的效率更高,因为它们是“惰性”的。 They only "yield" a single value at a time, rather than returning a whole list at once . 它们一次只能“产生”一个值,而不是一次返回整个列表。 In Python 2, xrange() is a range generator that works exactly like range except that it yields a single value at a time, rather than creating a whole list at once and returning that. 在Python 2中, xrange()是一个范围生成器,其作用范围与range完全相同,只是它一次生成一个值,而不是一次创建一个完整的列表并返回它。

for i in xrange(pow(10,8)):
    # do some work with the current value of i

In Python 3, there is no xrange() since the range() function returns a generator by default (technically it's range type, but it acts generally the same way). 在Python 3中,没有xrange()因为range()函数默认情况下会返回生成器(从技术上讲,它是range类型,但其作用方式大致相同)。

Here's an explanation of the difference between range() and xrange() 这是range()xrange()之间区别的解释

http://pythoncentral.io/how-to-use-pythons-xrange-and-range/ http://pythoncentral.io/how-to-use-pythons-xrange-and-range/

Lastly, if you really need to use huge lists like this, the Numpy library has all sorts of optimizations for "sparse lists" which act like regular lists but do some clever tricks to store seemingly millions of items efficiently. 最后,如果您确实需要使用庞大的列表,则Numpy库对“稀疏列表”进行了各种优化,它们的作用类似于常规列表,但做了一些巧妙的技巧来有效地存储看似数百万的项目。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM