[英]Parsing and outputting file as CSV in Python
我正在嘗試解析具有以下格式的文本文件:
+++++
line1
line2
<<<<<
+++++
rline1
rline2
<<<<<
其中, +++++
表示記錄的開始, <<<<<
表示記錄的結束。
現在,我想以以下格式將整個文本輸出到csv中:
line1, line2
rline1, rline2
我正在嘗試這樣的事情:
lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']
output_lines =[]
for line in lines:
if (line == "+++++") or not(line == "<<<<<") :
if (line == "<<<<<"):
output_lines.append(line)
output_lines.append(",")
print (output_lines)
我不確定如何從這里前進。
也許像這樣?
from itertools import groupby
import csv
lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']
# remove the +++++s, so that only the <<<<<s indicate line breaks
cleaned_list = [ x for x in lines if x is not "+++++" ]
# separate at <<<<<s
rows = [list(group) for k, group in groupby(cleaned_list, lambda x: x == "<<<<<") if not k]
f = open('result.csv', 'wt')
try:
writer = csv.writer(f)
for row in rows:
writer.writerow(row)
finally:
f.close()
print open('result.csv', 'rt').read()
收集嵌套循環中的行,直到記錄結束標記,然后將結果列表寫到CSV文件中:
import csv
with open(inputfilename) as infh, open(outputfilename, 'w', newline='') as outfh:
writer = csv.writer(outfh)
for line in infh:
if not line.startswith('+++++'):
continue
# found start, collect lines until end-of-record
row = []
for line in infh:
if line.startswith('<<<<<'):
# found end, end this inner loop
break
row.append(line.rstrip('\n'))
if row:
# lines for this record are added to the CSV file as a single row
writer.writerow(row)
外循環從輸入文件中獲取行,但跳過任何看起來不像記錄開頭的內容。 一旦開始被發現,第二個,內環吸引了來自文件對象多行,只要他們不看像記錄的結尾,將其添加到列表對象(沒有行分隔符)。
找到記錄的末尾時,內部循環結束,並且如果row
列表中收集了任何行,則將其寫到CSV文件中。
演示:
>>> import csv
>>> from io import StringIO
>>> import sys
>>> demo = StringIO('''\
... +++++
... line1
... line2
... <<<<<
... +++++
... rline1
... rline2
... <<<<<
... ''')
>>> writer = csv.writer(sys.stdout)
>>> for line in demo:
... if not line.startswith('+++++'):
... continue
... row = []
... for line in demo:
... if line.startswith('<<<<<'):
... break
... row.append(line.rstrip('\n'))
... if row:
... writer.writerow(row)
...
line1,line2
13
rline1,rline2
15
寫入行之后的數字是寫入的字節數,由writer.writerow()
報告。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.