簡體   English   中英

從 Python 3 中的文件讀取行時,如何更改默認換行符?

[英]How can I change the default newline character when reading lines from a file in Python 3?

最近關於使用空字符拆分二進制文件的問題讓我想到了一個類似的面向文本的問題。

鑒於以下文件:

 Parse me using spaces, please.

使用Raku ,我可以使用空格(或任何選擇的字符)作為輸入換行符來解析這個文件,因此:

my $fh = open('spaced.txt', nl-in => ' ');

while $fh.get -> $line {
    put $line;
}

或者更簡潔:

.put for 'spaced.txt'.IO.lines(nl-in => ' ');

其中任何一個都會給出以下結果:

 Parse me using spaces, please.

Python 3 中是否有等價的東西?

我能找到最接近的需要將整個文件讀入內存:

for line in f.read().split('\0'):
    print line

更新:我發現了其他幾個較舊的問題和答案,這些問題和答案似乎表明這不可用,但我認為過去幾年該領域可能有新的發展:
Python 限制 readlines() 的換行符
更改換行符 .readline() 尋求

沒有內置支持讀取由自定義字符分割的文件。

然而,加載帶有“U”標志的文件允許通用換行符,可以通過 file.newlines 獲得。 它在整個文件中保持換行模式。

這是我讀取文件的生成器,而不是內存中的所有內容:

def customReadlines(fileNextBuff, char):
    """
        \param fileNextBuff a function returning the next buffer or "" on EOF
        \param char a string with the lines are splitted, the char is not included in the yielded elements
    """
    lastLine = ""
    lenChar = len(char)
    while True:
         thisLine = fileNextBuff
         if not thisLine: break #EOF
         fnd = thisLine.find(char)
         while fnd != -1:
             yield lastLine + thisLine[:fnd]
             lastLine = ""
             thisLine = thisLine[fnd+lenChar:]
             fnd = thisLine.find(char)
         lastLine+= thisLine
    yield lastLine


### EXAMPLES ###

#open file.txt and print each part of the file ending with Null-terminator by loading a buffer of 256 characters
with open("file.bin", "r") as f:
    for l in customReadlines((lambda: f.read(0x100)), "\0"):
        print(l)

# open the file errors.log and split the file with a special string, while it loads a whole line at a time
with open("errors.log", "r") as f:
    for l in customReadlines(f.readline, "ERROR:")
        print(l)
        print(" " + '-' * 78) # some seperator

這個會做你需要的嗎?

def newreadline(f, newlinechar='\0'):
    c = f.read(1)
    b = [c]
    while(c != newlinechar and c != ''):
        c = f.read(1)
        b.append(c)
    return ''.join(b)

編輯:添加了readlines()的替代品:

def newreadlines(f, newlinechar='\0'):
    line = newreadline(f, newlinechar)
    while line:
        yield line
        line = newreadline(f, newlinechar)

以便 OP 可以執行以下操作:

for line in newreadlines(f, newlinechar='\0'):
    print(line)
def parse(fp, split_char, read_size=16):
    def give_chunks():
        while True:
            stuff = fp.read(read_size)
            if not stuff:
                break
            yield stuff
    leftover = ''
    for chunk in give_chunks():
        *stuff, leftover =  (leftover + chunk).split(split_char)
        yield from stuff
    if leftover:
        yield leftover

如果您可以使用新行和 split_char 進行拆分,則下面的工作正常(例如逐字閱讀文本文件)

def parse(fobj, split_char):
    for line in fobj:
        yield from line.split(split_char)

In [5]: for word in parse(open('stuff.txt'), ' '):
   ...:     print(word)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM