简体   繁体   English

使用带有索引或解析行的readlines()?

[英]Use readlines() with indices or parse lines on the fly?

I'm making a simple test function that asserts that the output from an interpreter I'm developing is correct, by reading from a file the expression to evaluate and the expected result, much like python's doctest. 我正在创建一个简单的测试函数,通过从文件读取要评估的表达式和预期结果来断言我正在开发的解释器的输出是正确的,就像python的doctest一样。 This is for scheme, so an example of an input file would be 这是针对方案的,因此输入文件的示例将是

> 42
42

> (+ 1 2 3)
6

My first attempt for a function that can parse such a file looks like the following, and it seems to work as expected: 我第一次尝试解析这样一个文件的函数如下所示,它似乎按预期工作:

def run_test(filename):
    interp = Interpreter()
    response_next = False
    num_tests = 0
    with open(filename) as f:
        for line in f:
            if response_next:
                assert response == line.rstrip('\n')
                response_next = False
            elif line.startswith('> '):
                num_tests += 1
                response = interp.eval(line[2:])
                response = str(response) if response else ''
                response_next = True
    print "{:20} Ran {} tests successfully".format(os.path.basename(filename),
                                                    num_tests)

I wanted to improve it slightly by removing the response_next flag, as I am not a fan of such flags, and instead read in the next line within the elif block with next(f) . 我想通过删除response_next标志来略微改进它,因为我不是这些标志的粉丝,而是用next(f)读入elif块中的下一行。 I had a small unrelated question regarding that which I asked about in IRC at freenode. 关于我在freenode的IRC询问的问题,我有一个小小的无关问题。 I got the help I wanted but I was also given the suggestion to use f.readlines() instead, and then use indexing on the resulting list. 我得到了我想要的帮助,但我也得到了使用f.readlines()的建议,然后在结果列表中使用索引。 (I was also told that I could use groupby() in itertools for the pairwise lines, but I'll investigate that approach later.) (我还被告知我可以在itertools使用groupby()作为成对行,但我稍后会研究这种方法。)

Now to the question, I was very curious why that approach would be better, but my Internet connection was a flaky one on a train and I was unable to ask, so I'll ask it here instead. 现在回答这个问题,我很好奇为什么这种方法会更好,但是我的网络连接在火车上是一个不稳定的,我无法问,所以我会在这里问它。 Why would it be better to read everything with readlines() instead of parsing every line as they are read on the fly? 为什么用readlines()读取所有内容而不是在运行时读取每一行时更好?

I'm really wondering as my feeling is the opposite, I think it seems cleaner to parse the lines one at a time so that everything is finished in one go. 我真的很纳闷,因为我的感觉正好相反,我认为一次解析一条线看起来更干净,所以一切都完成了。 I usually avoid using indices in arrays in Python and prefer to work with iterators and generators. 我通常避免在Python中使用数组中的索引,而更喜欢使用迭代器和生成器。 Maybe it is impossible to answer and guess what the person was thinking in case it was a subjective opinion, but if there is some general recommendation I'd be happy to hear about it. 也许不可能回答和猜测这个人在思考这个主观意见时的想法,但是如果有一些一般的建议,我会很高兴听到它。

It's certainly more Pythonic to process input iteratively rather than reading the whole input at once; 迭代处理输入而不是一次读取整个输入肯定更像Pythonic; for example, this will work if the input is a console. 例如,如果输入是控制台,这将起作用。

An argument in favour of reading a whole array and indexing is that using next(f) could be unclear when combined with a for loop; 支持读取整个数组和索引的论据是,当与for循环结合使用next(f)可能不清楚; the options there would be either to replace the for loop with a while True or to fully document that you are calling next on f within the loop: 那些选项可以用一个while True替换for循环,也可以用完全文档说明你在循环中调用f next

try:
    while True:
        test = next(f)
        response = next(f)
except StopIteration:
    pass

As Jonas suggests you could accomplish this (if you're sure that the input will always consist of lines test/response/test/response etc.) by zipping the input with itself: 正如Jonas建议你可以通过自己压缩输入来完成这个(如果你确定输入将总是由行测试/响应/测试/响应等组成):

for test, response in zip(f, f):               # Python 3
for test, response in itertools.izip(f, f):    # Python 2
from itertools import ifilter,imap

def run_test(filename):
    interp = Interpreter()
    num_tests, num_passed, last_result = 0, 0, None
    with open(filename) as f:
        # iterate over non-blank lines
        for line in ifilter(None, imap(str.strip, f)):
            if line.startswith('> '):
                last_result = interp.eval(line[2:])
            else:
                num_tests += 1
                try:
                    assert line == repr(last_test_result)
                except AssertionError, e:
                    print e.message
                else:
                    num_passed += 1
    print("Ran {} tests, {} passed".format(num_tests, num_passed))

... this simply assumes that any result-line refers to the preceding test. ...这只是假设任何结果线指的是前面的测试。

I would avoid .readlines() unless you get get some specific benefit from having the whole file available at once. 我会避免使用.readlines(),除非你从一次获得整个文件中获得一些特定的好处。

I also changed the comparison to look at the representation of the result, so it can distinguish between output types, ie 我还改变了比较以查看结果的表示 ,因此它可以区分输出类型,即

'6' + '2'
> '62'

60 + 2
> 62

Reading everything into an array gives you the equivalent of random access: You use an array index to move down the array, and at any time you can check what's next and back up if necessary. 将所有内容读入数组会为您提供相应的随机访问:您使用数组索引向下移动数组,并且可以随时检查下一步并在必要时进行备份。

If you can carry out your task without backing up, you don't need the random access and it would be cleaner to do without it. 如果您可以在不进行备份的情况下执行任务,则不需要随机访问,如果没有备份,则更加清晰。 In your examples, it seems that your syntax is always a single-line (?) expression followed by the expected response. 在您的示例中,您的语法似乎始终是单行(?)表达式,后跟预期的响应。 So, I'd have written a top-level loop that iterates once per expression-value pair, reading lines as necessary. 所以,我已经编写了一个顶级循环,每个表达式 - 值对迭代一次,根据需要读取行。 If you want to support multi-line expressions and results, you can write separate functions to read each one: One that reads a complete expression, one that reads a result (up to the next blank line). 如果要支持多行表达式和结果,可以编写单独的函数来读取每个表达式:一个读取完整表达式,一个读取结果(直到下一个空白行)。 The important thing is they should be able consume as much input as they need, and leave the input pointer in a reasonable state for the next input. 重要的是它们应该能够消耗所需的输入,并将输入指针保持在合理的状态以进行下一次输入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM