简体   繁体   English

Python - 使用readlines处理第n行跳转()

[英]Python - Handling an nth line hop with readlines()

I'm having a go at a fixing a broken lib that I want to use on Github. 我正在修复一个我想在Github上使用的破坏的lib。

I have locally "fixed" the problem. 我在当地“修复”了这个问题。 but I don't think its a very clean method... 但我认为这不是一个非常干净的方法......

I'm poking the WARC library by the internet archive, and spcifically the arc.py part ( https://github.com/internetarchive/warc/blob/master/warc/arc.py ). 我正在通过互联网存档戳戳WARC库,并且特别是arc.py部分( https://github.com/internetarchive/warc/blob/master/warc/arc.py )。

Since the lib was written, the tools that make the ARC files have changed a bit, and as a result, the builtin parser fails, as its not expecting to see some metadata in the file. 由于编写了lib,使得ARC文件的工具发生了一些变化,因此内置解析器失败,因为它不希望在文件中看到一些元数据。

My local fix looks like this: 我的本地修复看起来像这样:

    if header.startswith("<arcmetadata"):
        while not header.endswith("</arcmetadata>\n"):
            header = self.fileobj.readline()
        header = self.fileobj.readline()
        header = self.fileobj.readline()

And I'm not sure that my calling of readlines() twice to strip of the next two empty lines (containing "/n" is the cleanest way of advancing through the fileobject. 而且我不确定我对readlines()调用两次去除下两个空行(包含"/n"是推进文件对象的最简洁方法。

Is this good python? 这是好蟒蛇吗? or is there a better way? 或者,还有更好的方法?

The code looks like a copy/paste error. 代码看起来像复制/粘贴错误。 There is nothing wrong with using .readline() , just document what you are doing: 使用.readline()没有任何问题,只记录您正在做的事情:

# skip metadata
if header.startswith("<arcmetadata"):
    while not header.endswith("</arcmetadata>\n"):
        header = self.fileobj.readline()
    #NOTE: header ends with `"</arc..."` here i.e., it is not blank

# skip blank lines
while not header.strip():
    header = self.fileobj.readline()

btw, if the file contains xml then use an xml parser to parse it. 顺便说一句,如果文件包含xml,那么使用xml解析器来解析它。 Don't do it by hand. 不要手工做。

Although there's nothing inherently wrong with what you're doing, it might be more semantic to write: 虽然你正在做的事情本身没有任何错误,但写作可能更具语义性:

next(self.fileobj, None)

without a variable assignment to signify that you are tossing the next line. 没有变量赋值表示你正在抛出下一行。

itertools may be of use here itertools在这里可能有用

from itertools import islice, dropwhile
if header.startswith("<arcmetadata"):
    fileobj = dropwhile(lambda x: not x.endswith("</arcmetadata>\n"), fileobj)
    fileobj = islice(fileobj, 2, None)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM