Use readlines() with indices or parse lines on the fly?

Question

I'm making a simple test function that asserts that the output from an interpreter I'm developing is correct, by reading from a file the expression to evaluate and the expected result, much like python's doctest. This is for scheme, so an example of an input file would be

> 42
42

> (+ 1 2 3)
6

My first attempt for a function that can parse such a file looks like the following, and it seems to work as expected:

def run_test(filename):
    interp = Interpreter()
    response_next = False
    num_tests = 0
    with open(filename) as f:
        for line in f:
            if response_next:
                assert response == line.rstrip('\n')
                response_next = False
            elif line.startswith('> '):
                num_tests += 1
                response = interp.eval(line[2:])
                response = str(response) if response else ''
                response_next = True
    print "{:20} Ran {} tests successfully".format(os.path.basename(filename),
                                                    num_tests)

I wanted to improve it slightly by removing the response_next flag, as I am not a fan of such flags, and instead read in the next line within the elif block with next(f) . I had a small unrelated question regarding that which I asked about in IRC at freenode. I got the help I wanted but I was also given the suggestion to use f.readlines() instead, and then use indexing on the resulting list. (I was also told that I could use groupby() in itertools for the pairwise lines, but I'll investigate that approach later.)

Now to the question, I was very curious why that approach would be better, but my Internet connection was a flaky one on a train and I was unable to ask, so I'll ask it here instead. Why would it be better to read everything with readlines() instead of parsing every line as they are read on the fly?

I'm really wondering as my feeling is the opposite, I think it seems cleaner to parse the lines one at a time so that everything is finished in one go. I usually avoid using indices in arrays in Python and prefer to work with iterators and generators. Maybe it is impossible to answer and guess what the person was thinking in case it was a subjective opinion, but if there is some general recommendation I'd be happy to hear about it.

Answer 1

It's certainly more Pythonic to process input iteratively rather than reading the whole input at once; for example, this will work if the input is a console.

An argument in favour of reading a whole array and indexing is that using next(f) could be unclear when combined with a for loop; the options there would be either to replace the for loop with a while True or to fully document that you are calling next on f within the loop:

try:
    while True:
        test = next(f)
        response = next(f)
except StopIteration:
    pass

As Jonas suggests you could accomplish this (if you're sure that the input will always consist of lines test/response/test/response etc.) by zipping the input with itself:

for test, response in zip(f, f):               # Python 3
for test, response in itertools.izip(f, f):    # Python 2

Answer 2

from itertools import ifilter,imap

def run_test(filename):
    interp = Interpreter()
    num_tests, num_passed, last_result = 0, 0, None
    with open(filename) as f:
        # iterate over non-blank lines
        for line in ifilter(None, imap(str.strip, f)):
            if line.startswith('> '):
                last_result = interp.eval(line[2:])
            else:
                num_tests += 1
                try:
                    assert line == repr(last_test_result)
                except AssertionError, e:
                    print e.message
                else:
                    num_passed += 1
    print("Ran {} tests, {} passed".format(num_tests, num_passed))

... this simply assumes that any result-line refers to the preceding test.

I would avoid .readlines() unless you get get some specific benefit from having the whole file available at once.

I also changed the comparison to look at the representation of the result, so it can distinguish between output types, ie

'6' + '2'
> '62'

60 + 2
> 62

Answer 3

Reading everything into an array gives you the equivalent of random access: You use an array index to move down the array, and at any time you can check what's next and back up if necessary.

If you can carry out your task without backing up, you don't need the random access and it would be cleaner to do without it. In your examples, it seems that your syntax is always a single-line (?) expression followed by the expected response. So, I'd have written a top-level loop that iterates once per expression-value pair, reading lines as necessary. If you want to support multi-line expressions and results, you can write separate functions to read each one: One that reads a complete expression, one that reads a result (up to the next blank line). The important thing is they should be able consume as much input as they need, and leave the input pointer in a reasonable state for the next input.

Use readlines() with indices or parse lines on the fly?

Question

3 answers

solution1
1 2012-07-11 12:58:28

solution2
0 2012-07-11 15:49:16

solution3
0 ACCPTED 2012-07-11 16:30:24

Use readlines() with indices or parse lines on the fly?

Question

3 answers

solution1 1 2012-07-11 12:58:28

solution2 0 2012-07-11 15:49:16

solution3 0 ACCPTED 2012-07-11 16:30:24

solution1
1 2012-07-11 12:58:28

solution2
0 2012-07-11 15:49:16

solution3
0 ACCPTED 2012-07-11 16:30:24