使用python使用分隔符解析文本文件

Question

我需要解析一個文本文件，如下所示：

"id"$"date"$"text"

  10001$2016-01-11$"[start]
  this is some text
  [stop]
  "
  10002$2014-03-12$"[start]
  this is some more text
  [stop]
  "

將Python放入庫中，並將這三個不同的元素（id，日期和文本）作為鍵。

我不確定如何使用定界符拆分這些元素，以及如何將第一行用作列表中所有元素的鍵。

這樣的事情甚至可以只打印它就行：

infile = open('filename.txt', 'r')
for line in infile:
    if "????" in line:
        print(line, next(infile))

如果我嘗試：

infile = open('filename.txt', 'r')
   for line in infile:
    if '"text"' in line:
            print(next(infile))

它只打印第一行。

理想的情況是：

[{'id':'10001', 'date':'2016-01-11', 'text':'this is some text'},{'id':'10002', 'date':'2014-03-12', 'text':'this is some more text'}]

Answer 1

import csv
with open(path,'rb') as f:
    reader = csv.reader(f,delimiter='$')
    res = [ {'id':line[0],'date':line[1],'text':line[2]} for line in reader ]
    res = res[1:]

Answer 2

您可以使用python的內置csv庫來解析文件。

import csv


class Parser(object):
    START_TEXT = "[start]"
    END_TEXT = "[stop]"

    def __init__(self, filename):
        self.filename = filename


    def parse_file(self):
        elements = []

        with open(self.filename, 'r') as f:
            reader = csv.reader(f, delimiter='$')
            first_row = next(reader)

            key0 = first_row[0]
            key1 = first_row[1]
            key2 = first_row[2]

            for row in reader:
                elements.append({
                    key0: row[0],
                    key1: row[1],
                    key2: self.parse_text(row[2]),
                })

        return elements

    @classmethod
    def parse_text(cls, text):
        start_idx = text.index(cls.START_TEXT)
        end_idx = text.index(cls.END_TEXT)

        new_txt = text[start_idx + len(cls.START_TEXT):][:end_idx - len(cls.END_TEXT) - 1]

        return new_txt.lstrip('\n').rstrip('\n')


p = Parser("infile.txt")
elements = p.parse_file()

print elements

輸出：

[{'date': '2016-01-11', 'text': 'this is some text', 'id': '10001'}, {'date': '2014-03-12', 'text': 'this is some more text', 'id': '10002'}]

Answer 3

import csv

with open('f.txt') as fp:
    reader = csv.DictReader(fp, delimiter="$")
    data = list(reader)

for row in data:
    row.update({
        k:v.replace('[start]','').replace('[stop]','').replace('\n','')
        for k,v in row.items()})

print data

使用python使用分隔符解析文本文件

問題描述

3 個解決方案

解決方案1
0 已采納 2016-04-14 20:44:08

解決方案2
0 2016-04-14 20:54:09

解決方案3
0 2016-04-14 21:59:09

使用python使用分隔符解析文本文件

問題描述

3 個解決方案

解決方案1 0 已采納 2016-04-14 20:44:08

解決方案2 0 2016-04-14 20:54:09

解決方案3 0 2016-04-14 21:59:09

解決方案1
0 已采納 2016-04-14 20:44:08

解決方案2
0 2016-04-14 20:54:09

解決方案3
0 2016-04-14 21:59:09