简体   繁体   中英

Parsing txt file into dictionary in Python

There are a lot of posts about parsing a text file in Python but I have a special case where the txt file isn't always pretty.

In a perfect world, the key and value would be separated by an equals sign on the same line and you could iterate through line by line and store the values into a dictionary. But of course this isn't a perfect world. Here is a snippet of my txt file:

Map ID  = 
26
Device Type = iPhone OS
Tutorial viewed = false
Last 5 errors = (
    142,
    752,
    142,
    752,
    752
)

IP of Device     = XXX.XX.XXX.XX

It is very inconsistent in terms of keeping things on the same line. For example, sometimes

Device Type = iPhone OS

sometimes

Device Type = iPhone
OS

and sometimes

Device Type = 
iPhone OS

What is the best way to go through these files so I can get a dictionary similar to the code below no matter what kind of horrible formatting occurs:

{'Map ID': 26,
 'Device Type': iPhone OS,
 'Tutorial viewed': false,
 'Last 5 errors': {142, 752, 142, 752, 752},
 'IP of Device': XXX.XX.XXX.XX}

There are also many lines in the txt file that don't contain equals signs and some need to be ignored and some are delimited by a colon (:) but thats another story.

Assuming that at least the entire key is always on the same line as the equals sign, you can iterate through the lines, add a new entry if the line is a 'key' line and add to the last key's entry otherwise:

d = {}
for line in infile:
    if "=" in line:
        key, val = map(str.strip, line.split("="))
        d[key] = val
    else:
        d[key] += line.strip()

Also, = must never appear in a value. Output for your example:

{'IP of Device': 'XXX.XX.XXX.XX', 'Device Type': 'iPhone OS', 'Map ID': '26', 
 'Tutorial viewed': 'false', 'Last 5 errors': '(142,752,142,752,752)'}

Assuming that the delimiter (in this case '=') is never part of the data values, I'd do something like this:

mydict = {}
key, val = None, ''
for line in dirty_file:
    if '=' in line:
        if key is not None:
            mydict[key] = val  # You might want to do type conversions here
        key, val = line.strip().split('=')
    else:
        val += line.strip()

if key is not None:  # For the final item
    mydict[key] = val

The way i see it, you need to aggregate lines on the condition that you only encounter one "=" sign while doing the aggregation, as that is your best bet for a separator. The logic for parsing the error tuple into a set or the "false" string into a boolean is up to your implementation , but don't forget to strip the newline after the initial parsing . A piece of code might look like this :

initial split = myText.split("=")
firstKey = split[0]
secondSplit = split[1].split(\n)
firstVal = secondSplit[:-1]
secondKey = secondSplit[-1]

This is just an example, not a generalization. You would have to come up with the logic that threats the first and last pieces as special cases, while the middle ones are pretty much treated the same

Don't know how the rest of your file looks but this might work:

d = {}
key = None
value = ''
with open(infile) as fin:
    for line in fin:
        if '=' in line:  # New key starting.
            if key:
                d[key] = value
            value = ''  # Reset.
            key = line.split('=')[0].strip()
            value += line.split('=')[1].strip()

        else:  # Only additional value in line.
            value += line.strip() 

Can't comment yet unfortunately, but you're right, I changed the dictionary name.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM