![](/img/trans.png)
[英]How to print the next line from a file, if current line contains a string?
[英]If the next line of a file contains a string, append it to the end of the current one
我有1300萬行的CSV。 數據未使用引號封裝,並且包含換行符,這導致一行數據出現換行符。 數據每行沒有多個中斷,只有一個。
我將如何獲取這樣的數據?
Line of data
Line of data
continuation of previous line of data
Line of data
Line of data
continuation of previous line
Line of data
並將其轉換為:
Line of data
Line of data continuation of previous line of data
Line of data
Line of data continuation of previous line
Line of data
我已經通過將行存儲在變量中並處理下一個變量,查找第一個字符不是'L',然后附加它來測試了這一點。 我也嘗試過使用f.tell()
和f.seek()
在文件中四處移動,但是我無法使其正常工作。
假設每行以空格開頭,則應將其與前一行連接起來,這樣應該可以:
with open(data) as infile:
previous_line = None
for line in infile:
if previous_line is None:
previous_line = line
if line.startswith(' '):
line = previous_line.strip() + line
previous_line = line
print(line.strip())
這是為您提供的便宜,相當有效的延續生產線連接器。
def cont_lines(source):
last_line = ''
for line in source:
if line.startswith(' '):
last_line += line.lstrip() # append a continuation
else:
if last_line:
yield last_line
last_line = line
if last_line: # The one remaining as the source has ended.
yield last_line
像這樣使用:
with open("tile.csv") as f:
for line in cont_lines(f):
# do something with line
它僅使用與文件中最長的連續行集一樣多的內存。
我能夠解決一些問題。
infile = "test.txt"
def peek_line(f):
pos = f.tell()
line = f.readline()
f.seek(pos)
return line
f = open(infile, 'r')
while True:
line = f.readline()
if not line:
break
peek = peek_line(f)
if not peek.startswith('T'):
line = (line.strip() + f.readline())
print line,
我願意就此方法提供反饋。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.