如何将此文本文件转换为字典？

Question

我有一个文件f看起来像：

#labelA
there
is
something
here
#label_Bbb
here
aswell
...

它可以在一行上有许多标签和任意数量的元素（仅限str），每行标签可以有多行。 我想将这些数据存储在如下字典中：

d = {'labelA': 'thereissomethinghere', 'label_Bbb': 'hereaswell', ...}

我有一些子问题：

如何使用＃字符以了解新条目何时到位？
如何删除它并保留以下内容直到行结束？
如何才能在新行上追加每个字符串，直到＃再次弹出。
文件结束后如何停止？

Answer 1

首先， mydict包含以＃开头的键，值是一个列表（ 列表可以将行保持在它们的附加顺序中 ），我们将行添加到此列表中，直到我们找到以＃开头的下一行。 然后我们只需要将行列表转换为一个单独的字符串。

我正在使用python3，如果你使用python2替换mydict.items()和mydict.iteritems()来迭代键值对

mydict = dict()
with open("sample.csv") as inputs:
    for line in inputs:
        if line.startswith("#"):
            key = line.strip()[1:]
            mydict.setdefault(key,list())
        else:
            mydict[key].append(line.strip())

result = dict()
for key, vlist in mydict.items():
    result[key] = "".join(vlist)

print(result)

输出：

{'labelA': 'thereissomethinghere', 'label_Bbb': 'hereaswell'}

Answer 2

使用re.findall（）函数的最短解决方案：

import re 

with open("lines.txt", 'r') as fh:
    d = {k:v.replace('\n', '') for k,v in re.findall(r'^#(\w+)\s([^#]+)', fh.read(), re.M)}

print(d)

输出：

{'label_Bbb': 'hereaswell', 'labelA': 'thereissomethinghere'}

re.findall将返回元组列表，每个元组包含两个表示两个连续捕获组的项

Answer 3

f = open('untitled.txt', 'r')

line = f.readline()
d = {}
last_key = None
last_element = ''
while line:
    if line.startswith('#'):
        if last_key:
            d[last_key] = last_element
            last_element = ''
        last_key = line[:-1]
        last_element = ''
    else:
        last_element += line
    line = f.readline()

d[last_key] = last_element

Answer 4

使用collections.defaultdict ：

from collections import defaultdict

d = defaultdict(list)

with open('f.txt') as file:
    for line in file:
        if line.startswith('#'):
            key = line.lstrip('#').rstrip('\n')
        else:
            d[key].append(line.rstrip('\n'))
for key in d:
    d[key] = ''.join(d[key])

Answer 5

作为单一通行证而不制作临时词典：

res = {}
with open("sample") as lines:
    try:
        line = lines.next()
        while True:
            entry = ""
            if line.startswith("#"):
                next = lines.next()
                while not next.startswith("#"):
                    entry += next
                    next = lines.next()
            res[line[1:]] = entry
            line = next
    except StopIteration:
        res[line[1:]] = entry  # Catch the last entry

Answer 6

我会做这样的事情（这是伪代码所以它不会编译！）

dict = dict()
key = read_line()[1:]
while not end_file():
    text = ""
    line = read_line()
    while(line[0] != "#" and not end_file()):
        text += line
        line = read_line()

    dict[key] = text
    key = line[1:]

Answer 7

这是我的方法：

def eachChunk(stream):
  key = None
  for line in stream:
    if line.startswith('#'):
      line = line.rstrip('\n')
      if key:
        yield key, value
      key = line[1:]
      value = ''
    else:
      value += line
  yield key, value

您可以像这样快速创建希望的字典：

with open('f') as data:
  d = dict(eachChunk(data))

如何将此文本文件转换为字典？

问题描述

7 个解决方案

解决方案1
7 2017-02-14 22:46:50

解决方案2
2 2017-02-14 22:53:39

解决方案3
2 2017-02-14 22:54:59

解决方案4
1 2017-02-14 22:50:58

解决方案5
1 2017-02-14 22:56:18

解决方案6
1 2017-02-14 23:00:47

解决方案7
1 2017-02-14 23:38:32

如何将此文本文件转换为字典？

问题描述

7 个解决方案

解决方案1 7 2017-02-14 22:46:50

解决方案2 2 2017-02-14 22:53:39

解决方案3 2 2017-02-14 22:54:59

解决方案4 1 2017-02-14 22:50:58

解决方案5 1 2017-02-14 22:56:18

解决方案6 1 2017-02-14 23:00:47

解决方案7 1 2017-02-14 23:38:32

解决方案1
7 2017-02-14 22:46:50

解决方案2
2 2017-02-14 22:53:39

解决方案3
2 2017-02-14 22:54:59

解决方案4
1 2017-02-14 22:50:58

解决方案5
1 2017-02-14 22:56:18

解决方案6
1 2017-02-14 23:00:47

解决方案7
1 2017-02-14 23:38:32