簡體   English   中英

如果文件的下一行包含字符串,則將其附加到當前文件的末尾

[英]If the next line of a file contains a string, append it to the end of the current one

我有1300萬行的CSV。 數據未使用引號封裝,並且包含換行符,這導致一行數據出現換行符。 數據每行沒有多個中斷,只有一個。

我將如何獲取這樣的數據?

Line of data
Line of data
 continuation of previous line of data
Line of data
Line of data
 continuation of previous line
Line of data

並將其轉換為:

Line of data
Line of data continuation of previous line of data
Line of data
Line of data continuation of previous line
Line of data

我已經通過將行存儲在變量中並處理下一個變量,查找第一個字符不是'L',然后附加它來測試了這一點。 我也嘗試過使用f.tell()f.seek()在文件中四處移動,但是我無法使其正常工作。

假設每行以空格開頭,則應將其與前一行連接起來,這樣應該可以:

with open(data) as infile:
    previous_line = None
    for line in infile:
        if previous_line is None:
            previous_line = line
        if line.startswith(' '):
            line = previous_line.strip() + line
        previous_line = line
        print(line.strip())

這是為您提供的便宜,相當有效的延續生產線連接器。

def cont_lines(source):
    last_line = ''
    for line in source:
        if line.startswith(' '):
            last_line += line.lstrip()  # append a continuation
        else:
            if last_line:
                yield last_line
            last_line = line
    if last_line:  # The one remaining as the source has ended.
        yield last_line

像這樣使用:

with open("tile.csv") as f:
  for line in cont_lines(f):
     # do something with line

它僅使用與文件中最長的連續行集一樣多的內存。

我能夠解決一些問題。

infile = "test.txt"
def peek_line(f):
    pos = f.tell()
    line = f.readline()
    f.seek(pos)
    return line

f = open(infile, 'r')
while True:
    line = f.readline()
    if not line:
        break
    peek = peek_line(f)
    if not peek.startswith('T'):
        line = (line.strip() + f.readline())
    print line,

我願意就此方法提供反饋。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM