简体   繁体   中英

How to convert this text file into a dictionary?

I have a file f that looks something like:

#labelA
there
is
something
here
#label_Bbb
here
aswell
...

It can have a number of labels and any number of elements (only str) on a line, and several lines for each label. I would like to store this data in a dictionary like:

d = {'labelA': 'thereissomethinghere', 'label_Bbb': 'hereaswell', ...}

I have a number of sub-questions:

  1. How can I make use of the # character in order to know when a new entry is in place?
  2. How to remove it and keep whatever follows until the end of the line?
  3. How is it possible to append every string that follows on that new line until # pops up again.
  4. How can I stop when the file finishes?

Firstly, mydict contains the keys which starts with #, and the value is a list( list can keep the lines in their appending order ), we append lines into this list until we find next line that starts with #. Then we just need to convert the list of lines into one single string.

I am using python3, if you use python2 replace mydict.items() with mydict.iteritems() for iterating key-value pairs

mydict = dict()
with open("sample.csv") as inputs:
    for line in inputs:
        if line.startswith("#"):
            key = line.strip()[1:]
            mydict.setdefault(key,list())
        else:
            mydict[key].append(line.strip())

result = dict()
for key, vlist in mydict.items():
    result[key] = "".join(vlist)

print(result)

Output:

{'labelA': 'thereissomethinghere', 'label_Bbb': 'hereaswell'}

Shortest solution using re.findall() function:

import re 

with open("lines.txt", 'r') as fh:
    d = {k:v.replace('\n', '') for k,v in re.findall(r'^#(\w+)\s([^#]+)', fh.read(), re.M)}

print(d)

The output:

{'label_Bbb': 'hereaswell', 'labelA': 'thereissomethinghere'}

re.findall will return a list of tuples, each tuple contains two items representing two consecutive capturing groups

f = open('untitled.txt', 'r')

line = f.readline()
d = {}
last_key = None
last_element = ''
while line:
    if line.startswith('#'):
        if last_key:
            d[last_key] = last_element
            last_element = ''
        last_key = line[:-1]
        last_element = ''
    else:
        last_element += line
    line = f.readline()

d[last_key] = last_element

Use collections.defaultdict :

from collections import defaultdict

d = defaultdict(list)

with open('f.txt') as file:
    for line in file:
        if line.startswith('#'):
            key = line.lstrip('#').rstrip('\n')
        else:
            d[key].append(line.rstrip('\n'))
for key in d:
    d[key] = ''.join(d[key])

As a single pass without making interim dictionaries:

res = {}
with open("sample") as lines:
    try:
        line = lines.next()
        while True:
            entry = ""
            if line.startswith("#"):
                next = lines.next()
                while not next.startswith("#"):
                    entry += next
                    next = lines.next()
            res[line[1:]] = entry
            line = next
    except StopIteration:
        res[line[1:]] = entry  # Catch the last entry

I would do something like this (this is pseudocode so it won't compile!)

dict = dict()
key = read_line()[1:]
while not end_file():
    text = ""
    line = read_line()
    while(line[0] != "#" and not end_file()):
        text += line
        line = read_line()

    dict[key] = text
    key = line[1:]

Here is my approach:

def eachChunk(stream):
  key = None
  for line in stream:
    if line.startswith('#'):
      line = line.rstrip('\n')
      if key:
        yield key, value
      key = line[1:]
      value = ''
    else:
      value += line
  yield key, value

You can quickly create the wished dictionary like this:

with open('f') as data:
  d = dict(eachChunk(data))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM