[英]How to print the next line from a file, if current line contains a string?
[英]If the next line of a file contains a string, append it to the end of the current one
我有1300万行的CSV。 数据未使用引号封装,并且包含换行符,这导致一行数据出现换行符。 数据每行没有多个中断,只有一个。
我将如何获取这样的数据?
Line of data
Line of data
continuation of previous line of data
Line of data
Line of data
continuation of previous line
Line of data
并将其转换为:
Line of data
Line of data continuation of previous line of data
Line of data
Line of data continuation of previous line
Line of data
我已经通过将行存储在变量中并处理下一个变量,查找第一个字符不是'L',然后附加它来测试了这一点。 我也尝试过使用f.tell()
和f.seek()
在文件中四处移动,但是我无法使其正常工作。
假设每行以空格开头,则应将其与前一行连接起来,这样应该可以:
with open(data) as infile:
previous_line = None
for line in infile:
if previous_line is None:
previous_line = line
if line.startswith(' '):
line = previous_line.strip() + line
previous_line = line
print(line.strip())
这是为您提供的便宜,相当有效的延续生产线连接器。
def cont_lines(source):
last_line = ''
for line in source:
if line.startswith(' '):
last_line += line.lstrip() # append a continuation
else:
if last_line:
yield last_line
last_line = line
if last_line: # The one remaining as the source has ended.
yield last_line
像这样使用:
with open("tile.csv") as f:
for line in cont_lines(f):
# do something with line
它仅使用与文件中最长的连续行集一样多的内存。
我能够解决一些问题。
infile = "test.txt"
def peek_line(f):
pos = f.tell()
line = f.readline()
f.seek(pos)
return line
f = open(infile, 'r')
while True:
line = f.readline()
if not line:
break
peek = peek_line(f)
if not peek.startswith('T'):
line = (line.strip() + f.readline())
print line,
我愿意就此方法提供反馈。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.