簡體   English   中英

在Python中將文件解析和輸出為CSV

[英]Parsing and outputting file as CSV in Python

我正在嘗試解析具有以下格式的文本文件:

+++++
line1
line2
<<<<<
+++++
rline1
rline2
<<<<<

其中, +++++表示記錄的開始, <<<<<表示記錄的結束。

現在,我想以以下格式將整個文本輸出到csv中:

line1, line2
rline1, rline2

我正在嘗試這樣的事情:

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']
output_lines =[]

for line in lines:
    if (line == "+++++") or not(line == "<<<<<") :
        if (line == "<<<<<"):
            output_lines.append(line)
            output_lines.append(",")

print (output_lines)

我不確定如何從這里前進。

也許像這樣?

from itertools import groupby
import csv

lines =['+++++', 'line1', 'line2', '<<<<<', '+++++', 'rline1', 'rline2', '<<<<<']

# remove the +++++s, so that only the <<<<<s indicate line breaks
cleaned_list = [ x for x in lines if x is not "+++++" ]

# separate at <<<<<s
rows = [list(group) for k, group in groupby(cleaned_list, lambda x: x == "<<<<<") if not k]

f = open('result.csv', 'wt')
try:
    writer = csv.writer(f)
    for row in rows:
        writer.writerow(row)
finally:
    f.close()

print open('result.csv', 'rt').read()

收集嵌套循環中的行,直到記錄結束標記,然后將結果列表寫到CSV文件中:

import csv

with open(inputfilename) as infh, open(outputfilename, 'w', newline='') as outfh:
    writer = csv.writer(outfh)
    for line in infh:
        if not line.startswith('+++++'):
            continue

        # found start, collect lines until end-of-record
        row = []
        for line in infh:
            if line.startswith('<<<<<'):
                # found end, end this inner loop
                break
            row.append(line.rstrip('\n'))

        if row:
            # lines for this record are added to the CSV file as a single row
            writer.writerow(row)

外循環從輸入文件中獲取行,但跳過任何看起來不像記錄開頭的內容。 一旦開始被發現,第二個,內環吸引了來自文件對象行,只要他們看像記錄的結尾,將其添加到列表對象(沒有行分隔符)。

找到記錄的末尾時,內部循環結束,並且如果row列表中收集了任何行,則將其寫到CSV文件中。

演示:

>>> import csv
>>> from io import StringIO
>>> import sys
>>> demo = StringIO('''\
... +++++
... line1
... line2
... <<<<<
... +++++
... rline1
... rline2
... <<<<<
... ''')
>>> writer = csv.writer(sys.stdout)
>>> for line in demo:
...     if not line.startswith('+++++'):
...         continue
...     row = []
...     for line in demo:
...         if line.startswith('<<<<<'):
...             break
...         row.append(line.rstrip('\n'))
...     if row:
...         writer.writerow(row)
... 
line1,line2
13
rline1,rline2
15

寫入行之后的數字是寫入的字節數,由writer.writerow()報告。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM