简体   繁体   中英

reading file into a dictionary

I was wondering if there is a way that i can read delimitered text into a dictionary. I have been able to get it into lists no problem here is the code:

def _demo_fileopenbox():        
    msg  = "Pick A File!"
    msg2 = "Select a country to learn more about!"
    title = "Open files"
    default="*.py"
    f = fileopenbox(msg,title,default=default)
    writeln("You chose to open file: %s" % f)
    c = []
    a = []
    p = []

    with open(f,'r') as handle:
        reader = csv.reader(handle, delimiter = '\t')  
        for row in reader:
            c = c + [row[0]]
            a = a + [row[1]]
            p = p + [row[2]]
        while 1:
            reply = choicebox(msg=msg2, choices= c )
            writeln( reply + ";\tArea: " + a[(c.index(reply))] + " square miles \tPopulation: " + p[(c.index(reply))] )

that code makes it 3 lists because each line of text is a country name, their area, and their population. I had it that way so if i choose a country it will give me the corrosponding information on pop and area. Some people say a dictionary is a better approach, but first of all i dont think that i can put three things into one spot int the dictionary. I need the Country name to be the key and then the the population and area the info for that key. 2 dictionaries could probably work? but i just dont know how to get from file to dictionary, any help plz?

You could use two dictionaries, but you could also use a 2-tuple like this:

countries = {}

# ... other code as before

    for row in reader:
        countries[row[0]] = (row[1], row[2])

Then you can iterate through it all like this:

for country, (area, population) in countries.iteritems():
    # ... Do stuff with country, area and population

... or you can access data on a specific country like this:

area, population = countries["USA"]

Finally, if you're planning to add more information in the future you might instead want to use a class as a more elegant way to hold the information - this makes it easier to write code which doesn't break when you add new stuff. You'd have a class something like this:

class Country(object):

    def __init__(self, name, area, population):
        self.name = name
        self.area = area
        self.population = population

And then your reading code would look something like this:

for row in reader:
    countries[row[0]] = Country(row[0], row[1], row[2])

Or if you have the constructor take the entire row rather than individual items you might find it easier to extend the format later, but you're also coupling the class more closely to the representation in the file. It just depends on how you think you might extend things later.

Then you can look things up like this:

country = countries["USA"]
print "Area is: %s" % (country.area,)

This has the advantage that you can add new methods to do more clever stuff in the future. For example, a method which returns the population density:

class Country(object):

# ...

    def get_density(self):
        return self.population / self.area

In general I would recommend classes over something like nested dictionaries once you get beyond something where you're storing more than a couple of items. They make your code easier to read and easier to extend later.

As with most programming issues, however, other approaches will work - it's a case of choosing the method that works best for you.

Something like this should work:

from collections import defaultdict

myDict = {}
for row in reader:
    country, area, population = row
    myDict[country] = {'area': area, 'population': population}

Note that you'll have to add some error checking so that your code doesn't break if there are greater or less than three delimited items in each row.

You can access the values as follows:

>>> myDict['Mordor']['area']
175000
>>> myDict['Mordor']['population']
3000000

the value of the dictionary can be a tuple of the population and area info. So when you read in the file you can do something such as

countries_dict = {}

for row in reader:
        countries_dict[row[0]] = (row[1],row[2])
data = []

with open(f,'r') as handle:
    reader = csv.reader(handle, delimiter = '\t')  
    for row in reader:
        (country, area, population) = row
        data.append({'country': country, 'area': area, 'population': population})

Data would then be a list of dictionaries .

But I'm not sure that's really a better approach, because it would use more memory. Another option is just a list of lists:

data = list(csv.reader(open(f), delimiter='\t'))
print data
# [['USA', 'big', '300 million'], ...]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM