Python - 使用readlines处理第n行跳转（）

Question

I'm having a go at a fixing a broken lib that I want to use on Github. 我正在修复一个我想在Github上使用的破坏的lib。

I have locally "fixed" the problem. 我在当地“修复”了这个问题。 but I don't think its a very clean method... 但我认为这不是一个非常干净的方法......

I'm poking the WARC library by the internet archive, and spcifically the arc.py part ( https://github.com/internetarchive/warc/blob/master/warc/arc.py ). 我正在通过互联网存档戳戳WARC库，并且特别是arc.py部分（ https://github.com/internetarchive/warc/blob/master/warc/arc.py ）。

Since the lib was written, the tools that make the ARC files have changed a bit, and as a result, the builtin parser fails, as its not expecting to see some metadata in the file. 由于编写了lib，使得ARC文件的工具发生了一些变化，因此内置解析器失败，因为它不希望在文件中看到一些元数据。

My local fix looks like this: 我的本地修复看起来像这样：

    if header.startswith("<arcmetadata"):
        while not header.endswith("</arcmetadata>\n"):
            header = self.fileobj.readline()
        header = self.fileobj.readline()
        header = self.fileobj.readline()

And I'm not sure that my calling of readlines() twice to strip of the next two empty lines (containing "/n" is the cleanest way of advancing through the fileobject. 而且我不确定我对readlines()调用两次去除下两个空行（包含"/n"是推进文件对象的最简洁方法。

Is this good python? 这是好蟒蛇吗？ or is there a better way? 或者，还有更好的方法？

Answer 1

The code looks like a copy/paste error. 代码看起来像复制/粘贴错误。 There is nothing wrong with using .readline() , just document what you are doing: 使用.readline()没有任何问题，只记录您正在做的事情：

# skip metadata
if header.startswith("<arcmetadata"):
    while not header.endswith("</arcmetadata>\n"):
        header = self.fileobj.readline()
    #NOTE: header ends with `"</arc..."` here i.e., it is not blank

# skip blank lines
while not header.strip():
    header = self.fileobj.readline()

btw, if the file contains xml then use an xml parser to parse it. 顺便说一句，如果文件包含xml，那么使用xml解析器来解析它。 Don't do it by hand. 不要手工做。

Answer 2

Although there's nothing inherently wrong with what you're doing, it might be more semantic to write: 虽然你正在做的事情本身没有任何错误，但写作可能更具语义性：

next(self.fileobj, None)

without a variable assignment to signify that you are tossing the next line. 没有变量赋值表示你正在抛出下一行。

Answer 3

itertools may be of use here itertools在这里可能有用

from itertools import islice, dropwhile
if header.startswith("<arcmetadata"):
    fileobj = dropwhile(lambda x: not x.endswith("</arcmetadata>\n"), fileobj)
    fileobj = islice(fileobj, 2, None)

Python - 使用readlines处理第n行跳转（）

问题描述

3 个解决方案

解决方案1
2 已采纳 2013-11-26 00:39:21

解决方案2
1 2013-11-25 22:36:54

解决方案3
0 2013-11-26 00:19:02

Python - 使用readlines处理第n行跳转（）

问题描述

3 个解决方案

解决方案1 2 已采纳 2013-11-26 00:39:21

解决方案2 1 2013-11-25 22:36:54

解决方案3 0 2013-11-26 00:19:02

解决方案1
2 已采纳 2013-11-26 00:39:21

解决方案2
1 2013-11-25 22:36:54

解决方案3
0 2013-11-26 00:19:02