简体   繁体   中英

Python: Need help creating dictionaries from text files and splitting a list

I want to save data in text files and create dictionaries from those files, which I'll pass to a function later on.

Here's my code:

def lesson_dictionary(filename):
    print "Reading file ", filename
    with open(filename) as f:
        mylist = f.read().strip().split() 
        dictionary = OrderedDict(zip(mylist[::2], mylist[1::2])) #keep keys/values in same order as declared in mylist
        print dictionary
    return dictionary

With a sample file named sample.txt containing two columns of key/value pairs separated by a space, it works fine. For example,

ab

cd

ef

yields a list like so:

OrderedDict([('a', 'b'), ('c', 'd'), ('e', 'f')])

BUT if I change the code and the content of the .txt file, it breaks. For example, if sample2.txt included:

a:b

c:d

e:f

and my code is

def lesson_dictionary(filename):
    print "Reading file ", filename
    with open(filename) as f:
        mylist = f.read().strip().split(':') #CHANGED: split string at colon!
        dictionary = OrderedDict(zip(mylist[::2], mylist[1::2]))
        print dictionary
    return dictionary

I get the following output:

OrderedDict([('a', 'b \nc'), ('d\ne', 'f')])

What's happening? Why did strip() work for the first .txt file but not for the second? Thanks in advance for any help.

The original split() split on whitespace, and \\n is considered whitespace. By changing to split(':') you've removed the split on the end of line, so the end of one line is merged with the start of the next with an extra newline character in the middle. I don't think there's an easy way to fix it except to read the file one line at a time.

Edit: Some code to demonstrate.

dictionary = OrderedDict()
with open(filename) as f:
    for line in f:
        key, value = line.split(':')
        dictionary[key.strip()] = value.strip()

Or more in the spirit of your original:

with open(filename) as f:
    mylist = [line.strip().split(':') for line in f]
    dictionary = OrderedDict(mylist)

The second form has the disadvantage of not automatically stripping whitespace from around the words. Based on your example, you might need that.

split() without a separator splits on whitespace, which is both newlines and tabs/spaces. When you split on a colon, that algorithm no longer applies, so newlines show up in your output. Try:

dictionary = Ordereddict(l.strip().split(':') for l in f)

If you're creating the input files yourself, I believe json would be better suited for this problem.

You can use it like this:

import json

#write the dictionary to a file
outfile = open(filename, 'w')
json.dump(someDictionary, outfile)

#read the data back in
with open(filename) as infile:
    newDictionary = json.load(infile)

Have you tried printing out the contents of myList ?

myList = ["a", "b c", "d e", "f"]

Replace the colons with spaces first if you want them to behave the same way:

myList = f.read().replace(":", "").split()

Or, if you want the split them into key value pairs, just use string slicing to zip even and odd elements together:

s = f.read().split()
myDict = dict(zip(s[::2], s[1::2]))

If you want your code to be delimiter neutral , ie a:b , ab , a#b and such. Instead of regular split() use re.split() .

import re
pattern = re.compile(r"[^\w]")     # non-w char
with open(filename, "rt") as fr:
    return OrderedDict(pattern.split(l.strip()) for l in fr) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM