簡體   English   中英

使用python使用分隔符解析文本文件

[英]Using python to parse through text file with delimiter

我需要解析一個文本文件,如下所示:

"id"$"date"$"text"

  10001$2016-01-11$"[start]
  this is some text
  [stop]
  "
  10002$2014-03-12$"[start]
  this is some more text
  [stop]
  "

將Python放入庫中,並將這三個不同的元素(id,日期和文本)作為鍵。

我不確定如何使用定界符拆分這些元素,以及如何將第一行用作列表中所有元素的鍵。

這樣的事情甚至可以只打印它就行:

infile = open('filename.txt', 'r')
for line in infile:
    if "????" in line:
        print(line, next(infile))

如果我嘗試:

infile = open('filename.txt', 'r')
   for line in infile:
    if '"text"' in line:
            print(next(infile)) 

它只打印第一行。

理想的情況是:

[{'id':'10001', 'date':'2016-01-11', 'text':'this is some text'},{'id':'10002', 'date':'2014-03-12', 'text':'this is some more text'}]
import csv
with open(path,'rb') as f:
    reader = csv.reader(f,delimiter='$')
    res = [ {'id':line[0],'date':line[1],'text':line[2]} for line in reader ]
    res = res[1:]

您可以使用python的內置csv庫來解析文件。

import csv


class Parser(object):
    START_TEXT = "[start]"
    END_TEXT = "[stop]"

    def __init__(self, filename):
        self.filename = filename


    def parse_file(self):
        elements = []

        with open(self.filename, 'r') as f:
            reader = csv.reader(f, delimiter='$')
            first_row = next(reader)

            key0 = first_row[0]
            key1 = first_row[1]
            key2 = first_row[2]

            for row in reader:
                elements.append({
                    key0: row[0],
                    key1: row[1],
                    key2: self.parse_text(row[2]),
                })

        return elements

    @classmethod
    def parse_text(cls, text):
        start_idx = text.index(cls.START_TEXT)
        end_idx = text.index(cls.END_TEXT)

        new_txt = text[start_idx + len(cls.START_TEXT):][:end_idx - len(cls.END_TEXT) - 1]

        return new_txt.lstrip('\n').rstrip('\n')


p = Parser("infile.txt")
elements = p.parse_file()

print elements

輸出:

[{'date': '2016-01-11', 'text': 'this is some text', 'id': '10001'}, {'date': '2014-03-12', 'text': 'this is some more text', 'id': '10002'}]
import csv

with open('f.txt') as fp:
    reader = csv.DictReader(fp, delimiter="$")
    data = list(reader)

for row in data:
    row.update({
        k:v.replace('[start]','').replace('[stop]','').replace('\n','')
        for k,v in row.items()})

print data

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM