Is there a faster way to do a while loop in python?

Question

I just spent an hour scouring through my code to find out why it was dogging after I rewrote a bunch of it. It didn't occur to me that while loops were so slow. I guess I've never used them in a place where time is very critical.

I finally narrowed the problem down to this method, and after testing, I found that the two lines inside the while loop run very fast. About 30/100000 to 1/10000 of a second, but when I put my datetime calls directly outside the while loop, it slows down to about 1 second.

def query(self, command):
    result = ''
    result_list = []
    self.obd.write(command + "\r\n")
    while result != '>':
        result = self.obd.readline()
        result_list.append(result)
    self.result = result_list[-2].strip()

Why are while loops so insanely slow, and how would I speed this up?

To explain what I'm doing, I am getting serial input from a device that seems to have a mind of its own in terms of how many lines it outputs. Sometimes the information I need is on the 2nd line, and sometimes it's on the third, sometimes the first. All I know for sure is that it is the last line before the ">" sign, and other methods I've tried leave me with unread buffered data that messes me up later, so I have to wait for the ">".

EDIT: Apparently I didn't explain well enough what I did.

I started with the code above, and I edited it to check how fast it was running.

def query(self, command):
    result = ''
    result_list = []
    self.obd.write(command + "\r\n")
    while result != '>':
        a = datetime.datetime.now()
        result = self.obd.readline()
        result_list.append(result)
        b = datetime.datetime.now()
        print b - a
    self.result = result_list[-2].strip()

This takes an average of less than 1/10000 second to run each time it runs this method.

def query(self, command):
    result = ''
    result_list = []
    self.obd.write(command + "\r\n")
    a = datetime.datetime.now()
    while result != '>':
        result = self.obd.readline()
        result_list.append(result)
    b = datetime.datetime.now()
    print b - a
    self.result = result_list[-2].strip()

This says 1+ seconds each time it runs this method.

What is happening inside the while loop is that the serial port is being read from. If I do a for loop around it, it works for a while, then stops when the buffer gets a little behind, but I can query the port up to 60 hz.

If it isn't the while loop, why am I getting the results I am seeing?

Answer 1

To clarify, while loops in all interpretive languages (like Python) are very slow compared to their implementations in compiled languages (like c). This is also true of for loops, and others. The reason is that interpretive languages have a high per-statement overhead, while most compiled languages have little or no such overhead. This overhead is incurred on each iteration of the loop, even if using a list comprehension. There are at least three ways to optimize or mitigate loops in interpretive languages:

Optimize each loop iteration to brute-force a faster run time.
Use built-in operations which are well-optimized for the task.
Use libraries with "vectorized" functions like those available in numpy . (Best solution when reading/writing/operating on numeric data.) These libraries are usually composed partially using compiled code to speed up repetitive operations.

In your case, I'd suggest either the first one (optimizing the inside by only storing a scalar instead of an array):

def query(self, command):
    result = ''
    line = ''
    self.obd.write(command + "\r\n")
    while line != '>':
        result = line
        line = self.obd.readline()
    self.result = result.strip()

The append function takes more time than simple scalar assignment, so there is a slight time saving, and your code already ignores all but the second-to-last line.

Or, you could try using a well-optimized built-in function. If obd supports readline() , there's a good chance it's a file-like that will also support readlines() or read() . Using re.search with the result of read() can sometimes be faster depending on the length and complexity of the data:

def query(self, command):
    self.obd.write(command + "\r\n")
    result = re.search('(?:^|\n)([^\n]*?)\n>(\n|$)', obd.read())
    self.result = result.group(1) if result else None

The regex there isn't as complex as it seems. It just searches for a line followed by a second line equal to > . It's also not terribly efficient.

A final approach is to use non-regex built-ins to reduce the number of times your while loop has to run:

def query(self, command):
    self.obd.write(command + "\r\n")
    remaining = obd.read().lstrip()
    sep = ''
    while remaining and remaining[0] != '\n':
        chunk, sep, remaining = remaining.partition('\n>')
    self.result = chunk.rpartition('\n')[2] if sep else ''

That will only run the while once for each > that comes at the beginning of a line, which might only be once at all.

Note that the second two changes (regex and using partition) both rely on first reading the file-like in it's entirety. There are two side effects to be aware of:

Reading the whole file takes as much memory as adding the whole file to a list, so there is no memory saving compared to your previous approach, only time savings.
Because the whole file is read, the bytes/lines after the > are also read, and it will fail if obd doesn't send an EOF signal (like if it's a pipe that doesn't close). Be aware of that, especially if you intend to have another file continue reading from obd .

Answer 2

While loops aren't slow. In fact, the overhead of a while loop is virtually imperceptible. Your measurements must be off if you think the statements run fast outside of the loop.

Reading data from a file or serial device is one of the slowest things you can do. Since your while loop has a readline statement in it, that's probably what's slowing it down. Perhaps it is waiting for a line of input, and that waiting is what is slowing it down.

Your question mentions moving a datetime call from inside the loop to outside but I don't see any datetime function anywhere, so it's hard to speculate whether that's part of the problem.

Answer 3

I'm not sure exactly what the problem was in the first place, which is frustrating, but not frustrating enough for me to pick apart someone else's code. I did finally get a solution that works though.

I decided that instead of using readline(), I would make use of read(1), which reads one byte from the buffer each call. Doing that, I was able to wait for the ">" character, and return the previous line.

Here's the new method:

def query(self, command):
    line = ''
    self.obd.write(command + "\r\n")
    while True:
        c = self.obd.read(1)
        line += c
        if c == '>':
            break
    # should work even if there is no newline character
    self.result = line.split('\r')[-2].strip()

This works in the same amount of time as the previous method I was using with a for loop, ie. ~60hz, but is much less likely to allow the buffer to fill up with garbage.

Thanks for all the help. It got me on the right track.

Is there a faster way to do a while loop in python?

Question

3 answers

solution1
3 ACCPTED 2014-04-05 02:31:46

solution2
2 2014-04-05 01:34:22

solution3
0 2014-04-05 20:29:32

Is there a faster way to do a while loop in python?

Question

3 answers

solution1 3 ACCPTED 2014-04-05 02:31:46

solution2 2 2014-04-05 01:34:22

solution3 0 2014-04-05 20:29:32

solution1
3 ACCPTED 2014-04-05 02:31:46

solution2
2 2014-04-05 01:34:22

solution3
0 2014-04-05 20:29:32