Python: Need help creating dictionaries from text files and splitting a list

Question

I want to save data in text files and create dictionaries from those files, which I'll pass to a function later on.

Here's my code:

def lesson_dictionary(filename):
    print "Reading file ", filename
    with open(filename) as f:
        mylist = f.read().strip().split() 
        dictionary = OrderedDict(zip(mylist[::2], mylist[1::2])) #keep keys/values in same order as declared in mylist
        print dictionary
    return dictionary

With a sample file named sample.txt containing two columns of key/value pairs separated by a space, it works fine. For example,

ab

cd

ef

yields a list like so:

OrderedDict([('a', 'b'), ('c', 'd'), ('e', 'f')])

BUT if I change the code and the content of the .txt file, it breaks. For example, if sample2.txt included:

a:b

c:d

e:f

and my code is

def lesson_dictionary(filename):
    print "Reading file ", filename
    with open(filename) as f:
        mylist = f.read().strip().split(':') #CHANGED: split string at colon!
        dictionary = OrderedDict(zip(mylist[::2], mylist[1::2]))
        print dictionary
    return dictionary

I get the following output:

OrderedDict([('a', 'b \nc'), ('d\ne', 'f')])

What's happening? Why did strip() work for the first .txt file but not for the second? Thanks in advance for any help.

Answer 1

The original split() split on whitespace, and \\n is considered whitespace. By changing to split(':') you've removed the split on the end of line, so the end of one line is merged with the start of the next with an extra newline character in the middle. I don't think there's an easy way to fix it except to read the file one line at a time.

Edit: Some code to demonstrate.

dictionary = OrderedDict()
with open(filename) as f:
    for line in f:
        key, value = line.split(':')
        dictionary[key.strip()] = value.strip()

Or more in the spirit of your original:

with open(filename) as f:
    mylist = [line.strip().split(':') for line in f]
    dictionary = OrderedDict(mylist)

The second form has the disadvantage of not automatically stripping whitespace from around the words. Based on your example, you might need that.

Answer 2

split() without a separator splits on whitespace, which is both newlines and tabs/spaces. When you split on a colon, that algorithm no longer applies, so newlines show up in your output. Try:

dictionary = Ordereddict(l.strip().split(':') for l in f)

Answer 3

If you're creating the input files yourself, I believe json would be better suited for this problem.

You can use it like this:

import json

#write the dictionary to a file
outfile = open(filename, 'w')
json.dump(someDictionary, outfile)

#read the data back in
with open(filename) as infile:
    newDictionary = json.load(infile)

Answer 4

Have you tried printing out the contents of myList ?

myList = ["a", "b c", "d e", "f"]

Replace the colons with spaces first if you want them to behave the same way:

myList = f.read().replace(":", "").split()

Or, if you want the split them into key value pairs, just use string slicing to zip even and odd elements together:

s = f.read().split()
myDict = dict(zip(s[::2], s[1::2]))

Answer 5

If you want your code to be delimiter neutral , ie a:b , ab , a#b and such. Instead of regular split() use re.split() .

import re
pattern = re.compile(r"[^\w]")     # non-w char
with open(filename, "rt") as fr:
    return OrderedDict(pattern.split(l.strip()) for l in fr)

Python: Need help creating dictionaries from text files and splitting a list

Question

5 answers

solution1
4 ACCPTED 2012-05-10 22:06:29

solution2
2 2012-05-10 22:11:13

solution3
0 2012-05-10 21:56:29

solution4
0 2012-05-10 21:57:56

solution5
0 2016-04-04 16:49:10

Python: Need help creating dictionaries from text files and splitting a list

Question

5 answers

solution1 4 ACCPTED 2012-05-10 22:06:29

solution2 2 2012-05-10 22:11:13

solution3 0 2012-05-10 21:56:29

solution4 0 2012-05-10 21:57:56

solution5 0 2016-04-04 16:49:10

solution1
4 ACCPTED 2012-05-10 22:06:29

solution2
2 2012-05-10 22:11:13

solution3
0 2012-05-10 21:56:29

solution4
0 2012-05-10 21:57:56

solution5
0 2016-04-04 16:49:10