[英]How to convert this text file into a dictionary?
I have a file f
that looks something like: 我有一个文件
f
看起来像:
#labelA
there
is
something
here
#label_Bbb
here
aswell
...
It can have a number of labels and any number of elements (only str) on a line, and several lines for each label. 它可以在一行上有许多标签和任意数量的元素(仅限str),每行标签可以有多行。 I would like to store this data in a dictionary like:
我想将这些数据存储在如下字典中:
d = {'labelA': 'thereissomethinghere', 'label_Bbb': 'hereaswell', ...}
I have a number of sub-questions: 我有一些子问题:
Firstly, mydict
contains the keys which starts with #, and the value is a list( list can keep the lines in their appending order ), we append lines into this list until we find next line that starts with #. 首先,
mydict
包含以#开头的键,值是一个列表( 列表可以将行保持在它们的附加顺序中 ),我们将行添加到此列表中,直到我们找到以#开头的下一行。 Then we just need to convert the list of lines into one single string. 然后我们只需要将行列表转换为一个单独的字符串。
I am using python3, if you use python2 replace mydict.items()
with mydict.iteritems()
for iterating key-value pairs 我正在使用python3,如果你使用python2替换
mydict.items()
和mydict.iteritems()
来迭代键值对
mydict = dict()
with open("sample.csv") as inputs:
for line in inputs:
if line.startswith("#"):
key = line.strip()[1:]
mydict.setdefault(key,list())
else:
mydict[key].append(line.strip())
result = dict()
for key, vlist in mydict.items():
result[key] = "".join(vlist)
print(result)
Output: 输出:
{'labelA': 'thereissomethinghere', 'label_Bbb': 'hereaswell'}
Shortest solution using re.findall() function: 使用re.findall()函数的最短解决方案:
import re
with open("lines.txt", 'r') as fh:
d = {k:v.replace('\n', '') for k,v in re.findall(r'^#(\w+)\s([^#]+)', fh.read(), re.M)}
print(d)
The output: 输出:
{'label_Bbb': 'hereaswell', 'labelA': 'thereissomethinghere'}
re.findall
will return a list of tuples, each tuple contains two items representing two consecutive capturing groups re.findall
将返回元组列表,每个元组包含两个表示两个连续捕获组的项
f = open('untitled.txt', 'r')
line = f.readline()
d = {}
last_key = None
last_element = ''
while line:
if line.startswith('#'):
if last_key:
d[last_key] = last_element
last_element = ''
last_key = line[:-1]
last_element = ''
else:
last_element += line
line = f.readline()
d[last_key] = last_element
Use collections.defaultdict
: 使用
collections.defaultdict
:
from collections import defaultdict
d = defaultdict(list)
with open('f.txt') as file:
for line in file:
if line.startswith('#'):
key = line.lstrip('#').rstrip('\n')
else:
d[key].append(line.rstrip('\n'))
for key in d:
d[key] = ''.join(d[key])
As a single pass without making interim dictionaries: 作为单一通行证而不制作临时词典:
res = {}
with open("sample") as lines:
try:
line = lines.next()
while True:
entry = ""
if line.startswith("#"):
next = lines.next()
while not next.startswith("#"):
entry += next
next = lines.next()
res[line[1:]] = entry
line = next
except StopIteration:
res[line[1:]] = entry # Catch the last entry
I would do something like this (this is pseudocode so it won't compile!) 我会做这样的事情(这是伪代码所以它不会编译!)
dict = dict()
key = read_line()[1:]
while not end_file():
text = ""
line = read_line()
while(line[0] != "#" and not end_file()):
text += line
line = read_line()
dict[key] = text
key = line[1:]
Here is my approach: 这是我的方法:
def eachChunk(stream):
key = None
for line in stream:
if line.startswith('#'):
line = line.rstrip('\n')
if key:
yield key, value
key = line[1:]
value = ''
else:
value += line
yield key, value
You can quickly create the wished dictionary like this: 您可以像这样快速创建希望的字典:
with open('f') as data:
d = dict(eachChunk(data))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.