简体   繁体   English

有没有更快的方法来在python中进行while循环?

[英]Is there a faster way to do a while loop in python?

I just spent an hour scouring through my code to find out why it was dogging after I rewrote a bunch of it. 我只花了一个小时仔细检查我的代码,以找出为什么我重写了一大堆代码后仍会陷入困境。 It didn't occur to me that while loops were so slow. 我没想到循环这么慢。 I guess I've never used them in a place where time is very critical. 我想我从来没有在时间很紧迫的地方使用过它们。

I finally narrowed the problem down to this method, and after testing, I found that the two lines inside the while loop run very fast. 我最终将问题缩小到此方法,经过测试,我发现while循环内的两行运行非常快。 About 30/100000 to 1/10000 of a second, but when I put my datetime calls directly outside the while loop, it slows down to about 1 second. 大约是30/100000到1/10000秒,但是当我将datetime调用直接放在while循环之外时,它会减慢到大约1秒。

def query(self, command):
    result = ''
    result_list = []
    self.obd.write(command + "\r\n")
    while result != '>':
        result = self.obd.readline()
        result_list.append(result)
    self.result = result_list[-2].strip()

Why are while loops so insanely slow, and how would I speed this up? 为什么while循环如此之慢如此缓慢,我将如何加快速度呢?

To explain what I'm doing, I am getting serial input from a device that seems to have a mind of its own in terms of how many lines it outputs. 为了解释我在做什么,我从一台设备获取串行输入,该设备似乎对输出多少行有自己的看法。 Sometimes the information I need is on the 2nd line, and sometimes it's on the third, sometimes the first. 有时我需要的信息在第二行,有时在第三行,有时在第一行。 All I know for sure is that it is the last line before the ">" sign, and other methods I've tried leave me with unread buffered data that messes me up later, so I have to wait for the ">". 我只知道那是“>”符号之前的最后一行,而我尝试过的其他方法使我留下了未读的缓冲数据,这些数据后来使我感到混乱,因此我必须等待“>”。

EDIT: Apparently I didn't explain well enough what I did. 编辑:显然我做得不够好。

I started with the code above, and I edited it to check how fast it was running. 我从上面的代码开始,然后对其进行了编辑以检查其运行速度。

def query(self, command):
    result = ''
    result_list = []
    self.obd.write(command + "\r\n")
    while result != '>':
        a = datetime.datetime.now()
        result = self.obd.readline()
        result_list.append(result)
        b = datetime.datetime.now()
        print b - a
    self.result = result_list[-2].strip()

This takes an average of less than 1/10000 second to run each time it runs this method. 每次运行此方法,平均运行时间少于1/10000秒。

def query(self, command):
    result = ''
    result_list = []
    self.obd.write(command + "\r\n")
    a = datetime.datetime.now()
    while result != '>':
        result = self.obd.readline()
        result_list.append(result)
    b = datetime.datetime.now()
    print b - a
    self.result = result_list[-2].strip()

This says 1+ seconds each time it runs this method. 每次运行此方法都需要1+秒。

What is happening inside the while loop is that the serial port is being read from. while循环内部发生的事情是正在读取串行端口。 If I do a for loop around it, it works for a while, then stops when the buffer gets a little behind, but I can query the port up to 60 hz. 如果我在它周围进行for循环,它会工作一段时间,然后在缓冲区落后一点时停止,但是我可以查询最大60 Hz的端口。

If it isn't the while loop, why am I getting the results I am seeing? 如果不是while循环,为什么我得到所看到的结果?

To clarify, while loops in all interpretive languages (like Python) are very slow compared to their implementations in compiled languages (like c). 需要澄清的是,与所有解释性语言(如Python)的循环相比,其在编译语言(如c)中的实现速度非常慢。 This is also true of for loops, and others. 对于for循环和其他循环也是如此。 The reason is that interpretive languages have a high per-statement overhead, while most compiled languages have little or no such overhead. 原因是解释性语言每个语句的开销很大,而大多数编译语言却很少或根本没有这样的开销。 This overhead is incurred on each iteration of the loop, even if using a list comprehension. 即使使用列表推导,此开销也会在循环的每次迭代中产生。 There are at least three ways to optimize or mitigate loops in interpretive languages: 至少有三种方法可以优化或缓解解释性语言中的循环:

  • Optimize each loop iteration to brute-force a faster run time. 优化每个循环迭代,以蛮横地加快运行时间。
  • Use built-in operations which are well-optimized for the task. 使用针对任务进行了优化的内置操作。
  • Use libraries with "vectorized" functions like those available in numpy . 使用具有“矢量化”功能的库,例如numpy可用的库。 (Best solution when reading/writing/operating on numeric data.) These libraries are usually composed partially using compiled code to speed up repetitive operations. (读取/写入/操作数字数据时的最佳解决方案。)这些库通常部分地使用编译后的代码组成,以加快重复操作的速度。

In your case, I'd suggest either the first one (optimizing the inside by only storing a scalar instead of an array): 在您的情况下,我建议第一个(通过仅存储标量而不是数组来优化内部):

def query(self, command):
    result = ''
    line = ''
    self.obd.write(command + "\r\n")
    while line != '>':
        result = line
        line = self.obd.readline()
    self.result = result.strip()

The append function takes more time than simple scalar assignment, so there is a slight time saving, and your code already ignores all but the second-to-last line. append函数比简单的标量分配要花费更多的时间,因此节省了一点时间,并且您的代码已经忽略了倒数第二行。

Or, you could try using a well-optimized built-in function. 或者,您可以尝试使用优化的内置函数。 If obd supports readline() , there's a good chance it's a file-like that will also support readlines() or read() . 如果obd支持readline() ,则很有可能是类似文件的文件,也将支持readlines()read() Using re.search with the result of read() can sometimes be faster depending on the length and complexity of the data: 根据数据的长度和复杂性,将re.searchread()的结果结合使用有时会更快:

def query(self, command):
    self.obd.write(command + "\r\n")
    result = re.search('(?:^|\n)([^\n]*?)\n>(\n|$)', obd.read())
    self.result = result.group(1) if result else None

The regex there isn't as complex as it seems. 正则表达式并不像看起来那么复杂。 It just searches for a line followed by a second line equal to > . 它仅搜索第二行,其后是等于> It's also not terribly efficient. 它也不是非常有效。

A final approach is to use non-regex built-ins to reduce the number of times your while loop has to run: 最后一种方法是使用非正则表达式内置函数来减少while循环必须运行的次数:

def query(self, command):
    self.obd.write(command + "\r\n")
    remaining = obd.read().lstrip()
    sep = ''
    while remaining and remaining[0] != '\n':
        chunk, sep, remaining = remaining.partition('\n>')
    self.result = chunk.rpartition('\n')[2] if sep else ''

That will only run the while once for each > that comes at the beginning of a line, which might only be once at all. 对于行开始处的每个> ,这只会运行一次,而可能完全是一次。

Note that the second two changes (regex and using partition) both rely on first reading the file-like in it's entirety. 请注意,后两个更改(正则表达式和使用分区)都依赖于首先读取整个文件。 There are two side effects to be aware of: 有两个副作用需要注意:

  • Reading the whole file takes as much memory as adding the whole file to a list, so there is no memory saving compared to your previous approach, only time savings. 读取整个文件所需的内存与将整个文件添加到列表一样多,因此与以前的方法相比,没有节省内存,只有时间。
  • Because the whole file is read, the bytes/lines after the > are also read, and it will fail if obd doesn't send an EOF signal (like if it's a pipe that doesn't close). 因为读取了整个文件,所以也读取了> 之后的字节/行,并且如果obd不发送EOF信号(例如,它是未关闭的管道),它将失败。 Be aware of that, especially if you intend to have another file continue reading from obd . 请注意,特别是如果您打算让另一个文件继续从obd读取时。

While loops aren't slow. 虽然循环并不慢。 In fact, the overhead of a while loop is virtually imperceptible. 实际上,while循环的开销实际上是不可察觉的。 Your measurements must be off if you think the statements run fast outside of the loop. 如果您认为语句在循环外快速运行,则必须关闭测量。

Reading data from a file or serial device is one of the slowest things you can do. 从文件或串行设备读取数据是您可以做的最慢的事情之一。 Since your while loop has a readline statement in it, that's probably what's slowing it down. 由于您的while循环中包含readline语句,因此可能正在减慢速度。 Perhaps it is waiting for a line of input, and that waiting is what is slowing it down. 也许它正在等待一行输入,而这种等待正在减慢它的速度。

Your question mentions moving a datetime call from inside the loop to outside but I don't see any datetime function anywhere, so it's hard to speculate whether that's part of the problem. 您的问题提到将datetime调用从循环内部移动到外部,但是我在任何地方都看不到任何datetime函数,因此很难推测这是否是问题的一部分。

I'm not sure exactly what the problem was in the first place, which is frustrating, but not frustrating enough for me to pick apart someone else's code. 我不确定到底是什么问题,这很令人沮丧,但还不足以让我分清别人的代码。 I did finally get a solution that works though. 我终于得到了一个可行的解决方案。

I decided that instead of using readline(), I would make use of read(1), which reads one byte from the buffer each call. 我决定不使用readline(),而是使用read(1),它在每次调用时从缓冲区读取一个字节。 Doing that, I was able to wait for the ">" character, and return the previous line. 这样,我就可以等待“>”字符,然后返回上一行。

Here's the new method: 这是新方法:

def query(self, command):
    line = ''
    self.obd.write(command + "\r\n")
    while True:
        c = self.obd.read(1)
        line += c
        if c == '>':
            break
    # should work even if there is no newline character
    self.result = line.split('\r')[-2].strip()

This works in the same amount of time as the previous method I was using with a for loop, ie. 这与我使用for循环使用的先前方法所花费的时间相同。 ~60hz, but is much less likely to allow the buffer to fill up with garbage. 〜60hz,但不太可能使缓冲区充满垃圾。

Thanks for all the help. 感谢您的所有帮助。 It got me on the right track. 它使我步入正轨。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM