Python- File Parsing

Question

Write a program which reads a text file called input.txt which contains an arbitrary number of lines of the form ", " then records this information using a dictionary, and finally outputs to the screen a list of countries represented in the file and the number of cities contained.

For example, if input.txt contained the following:

New York, US
Angers, France
Los Angeles, US
Pau, France
Dunkerque, France
Mecca, Saudi Arabia

The program would output the following (in some order):

Saudi Arabia : 1
US : 2
France : 3

My code:

from os import dirname

def parseFile(filename, envin, envout = {}):
    exec "from sys import path" in envin
    exec "path.append(\"" + dirname(filename) + "\")" in envin
    envin.pop("path")
    lines = open(filename, 'r').read()
    exec lines in envin
    returndict = {}
    for key in envout:
        returndict[key] = envin[key]
    return returndict

I get a Syntax error: invalid syntax... when I use my file name i used file name input.txt

Answer 1

I don't understand what you are trying to do, so I can't really explain how to fix it. In particular, why are you exec ing the lines of the file? And why write exec "foo" instead of just foo ? I think you should go back to a basic Python tutorial...

Anyway, what you need to do is:

open the file using its full path
for line in file: process the line and store it in a dictionary
return the dictionary

That's it, no exec involved.

Answer 2

Yup, that's a whole lot of crap you either don't need or shouldn't do. Here's how I'd do it prior to Python 2.7 (after that, use collections.Counter as shown in the other answers). Mind you, this'll return the dictionary containing the counts, not print it, you'd have to do that externally. I'd also not prefer to give a complete solution for homeworks, but it's already been done, so I suppose there's no real damage in explaining a bit about it.

def parseFile(filename):
  with open(filename, 'r') as fh:
    lines = fh.readlines()
    d={}
    for country in [line.split(',')[1].strip() for line in lines]:
      d[country] = d.get(country,0) + 1
    return d

Lets break that down a bit, shall we?

  with open(filename, 'r') as fh:
    lines = fh.readlines()

This is how you'd normally open a text file for reading. It will raise an IOError exception if the file doesn't exist or you don't have permissions or the likes, so you'll want to catch that. readlines() reads the entire file and splits it into lines, each line becomes an element in a list.

    d={}

This simply initializes an empty dictionary

    for country in [line.split(',')[1].strip() for line in lines]:

Here is where the fun starts. The bracket enclosed part to the right is called a list comprehension, and it basically generates a list for you. What it pretty much says, in plain english, is "for each element 'line' in the list 'lines', take that element/line, split it on each comma, take the second element (index 1) of the list you get from the split, strip off any whitespace from it, and use the result as an element in the new list" Then, the left part of it just iterates over the generated list, giving the name 'country' to the current element in the scope of the loop body.

      d[country] = d.get(country,0) + 1

Ok, ponder for a second what would happen if instead of the above line, we'd used the following:

      d[country] = d[country] + 1

It'd crash, right (KeyError exception), because d[country] doesn't have a value the first time around. So we use the get() method, all dictionaries have it. Here's the nifty part - get() takes an optional second argument, which is what we want to get from it if the element we're looking for doesn't exist. So instead of crashing, it returns 0, which (unlike None) we can add 1 to, and update the dictionary with the new count. Then we just return the lot of it.

Hope it helps.

Answer 3

import collections

def readFile(fname):
    with open(fname) as inf:
        return [tuple(s.strip() for s in line.split(",")) for line in inf]

def countCountries(city_list):
    return collections.Counter(country for city,country in city_list)

def main():
    cities = readFile("input.txt")
    countries = countCountries(cities)

    print("{0} cities found in {1} countries:".format(len(cities), len(countries)))

    for country, num in countries.iteritems():
        print("{country}: {num}".format(country=country, num=num))

if __name__=="__main__":
    main()

Answer 4

I would use a defaultdict plus a list to mantain the structure of the information. So additional statistics can be derived.

import collections

def parse_cities(filepath):
    countries_cities_map = collections.defaultdict(list)
    with open(filepath) as fd:
        for line in fd:
            values = line.strip().split(',')
            if len(values) == 2:
                city, country = values
                countries_cities_map[country].append(city)
    return countries_cities_map

def format_cities_per_country(countries_cities_map):
    for country, cities in countries_cities_map.iteritems():
        print " {ncities} Cities found in {country} country".format(country=country, ncities = len(cities))


if __name__ == '__main__':
    import sys
    filepath = sys.argv[1]
    format_cities_per_country(parse_cities(filepath))

Python- File Parsing

Question

4 answers

solution1
4 2011-04-09 16:35:06

solution2
3 2011-04-09 17:11:47

solution3
1 2011-04-09 16:38:04

solution4
1 ACCPTED 2011-04-09 16:58:53

Python- File Parsing

Question

4 answers

solution1 4 2011-04-09 16:35:06

solution2 3 2011-04-09 17:11:47

solution3 1 2011-04-09 16:38:04

solution4 1 ACCPTED 2011-04-09 16:58:53

solution1
4 2011-04-09 16:35:06

solution2
3 2011-04-09 17:11:47

solution3
1 2011-04-09 16:38:04

solution4
1 ACCPTED 2011-04-09 16:58:53