简体   繁体   中英

Reading in a CSV file AND sorting it in Python

I am trying to read in a CSV file that looks like this:

ruby,2,100
diamond,1,400
emerald,3,250
amethyst,2,50
opal,1,300
sapphire,2,500
malachite,1,60

Here is some code I have been experimenting with.

class jewel:
    def __init__(gem, name, carat, value):
            gem.name = name
            gem.carot = carat
            gem.value = value
    def __repr__(gem):
            return repr((gem.name, gem.carat, gem.value))

jewel_objects = [jewel('diamond', '1', 400),
                 jewel('ruby', '2', 200),
                 jewel('opal', '1', 600),
                ]

aList = [sorted(jewel_objects, key=lambda jewel: (jewel.value))]
print aList

I would like to read in the values and assign them to name, carat, and value but I'm not sure how to do so. Then once I get them read in I would like to sort them by value per carat so value/carat. I have done quite a bit of searching and have came up blank. Thank you very much for your help in advance.

You need to do two things here, the first is actually loading the data into the objects. I recommend you look at the 'csv' module in the standard python library for this. It's very complete and will read each row and make it easily accessable

CSV docs: http://docs.python.org/library/csv.html

I would create a list of the objects, and then implement either an cmp function in your object, or (if you're using an older version of python) you can pass a function to sorted() that would define it. You can get more info about sorting in the python wiki

Wiki docs: http://wiki.python.org/moin/HowTo/Sorting

You would implement the cmp function like this in your class (this can be made a bit more efficent, but I'm being descriptive here)

def __cmp__(gem, other):
    if (gem.value / gem.carot) < (other.value / other.carot):
        return -1
    elif (gem.value / gem.carot) > (other.value / other.carot): 
        return 1
    else:
        return 0

Python has a csv module that should be really helpful to you.

http://docs.python.org/library/csv.html

You can use numpy structured arrays along with the csv module and use numpy.sort() to sort the data. The following code should work. Suppose your csv file is named geminfo.csv

import numpy as np
import csv

fileobj = open('geminfo.csv','rb')
csvreader = csv.reader(fileobj)

# Convert data to a list of lists
importeddata = list(csvreader)

# Calculate Value/Carat and add it to the imported data
# and convert each entry to a tuple
importeddata = [tuple(entry + [float(entry[2])/entry[1]]) for entry in importeddata]

One way to sort this data is to use numpy as shown below.

# create an empty array
data = np.zeros(len(importeddata), dtype = [('Stone Name','a20'),
                            ('Carats', 'f4'),
                            ('Value', 'f4'), 
                            ('valuepercarat', 'f4')]
                        )
data[:] = importeddata[:]
datasortedbyvaluepercarat = np.sort(data, order='valuepercarat')

For parsing real-world CSV (comma-separated values) data you'll want to use the CSV module that's included with recent versions of Python.

CSV is a set of conventions rather than standard. The sample data you show is simple and regular, but CSV generally has some ugly corner cases for quoting where the contents of any field might have embedded commas, for example.

Here is a very crude program, based on your code, which does naïve parsing of the data (splitting by lines, then splitting each line on commas). It will not handle any data which doesn't split to precisely the correct number of fields, nor any where the numeric fields aren't correctly parsed by the Python int() and float() functions (object constructors). In other words this contains no error checking nor exception handling.

However, I've kept it deliberately simple so it can be easily compared to your rough notes. Also note that I've used the normal Python conventions regarding "self" references in the class definition. (About the only time one would use names other than "self" for these is when doing "meta-class" programming ... writing classes which dynamically instantiate other classes. Any other case will almost certainly cause serious concerns in the minds of any experienced Python programmers looking at your code).

#!/usr/bin/env python
class Jewel:
    def __init__(self, name, carat, value):
        self.name = name
        self.carat = int(carat)
        self.value = float(value)
        assert self.carat != 0      # Division by zero would result from this
    def __repr__(self):
        return repr((self.name, self.carat, self.value))

if __name__ == '__main__':
    sample='''ruby,2,100
diamond,1,400
emerald,3,250
amethyst,2,50
opal,1,300
sapphire,2,500
malachite,1,60'''

    these_jewels = list()
    for each_line in sample.split('\n'):
        gem_type, carat, value = each_line.split(',')
        these_jewels.append(Jewel(gem_type, carat, value))
        # Equivalently: 
        # these_jewels.append(Jewel(*each_line.split(',')))

    decorated = [(x.value/x.carat, x) for x in these_jewels]
    results = [x[1] for x in sorted(decorated)]
    print '\n'.join([str(x) for x in results])

The parsing here is done simply using the string .split() method, and the data is extracted into names using Python's "tuple unpacking" syntax (this would fail if any line of input were to have the wrong number of fields).

The alternative syntax to those two lines uses Python's "apply" syntax. The * prefix on the argument causes it to be unpacked into separate arguments which are passed to the Jewel() class instantiation.

This code also uses the widespread (and widely recommended) DSU (decorate, sort, undecorate) pattern for sorting on some field of your data. I "decorate" the data by creating a series of tuples: (computed value, object reference), then "undecorate" the sorted data in a way which I hope is clear to you. (It would be immediately clear to any experienced Python programmer).

Yes the whole DSU could be reduced to a single line; I've separated it here for legibility and pedagogical purposes.

Again this sample code is purely for your edification. You should use the CSV module on any real-world data; and you should introduce exception handling either in the parsing or in the Jewel.__init__ handling (for converting the numeric data into the correct Python types. (Also note that you should consider using Python's Decimal module rather than float() s for representing monetary values ... or at least storing the values in cents or mils and using your own functions to represent those as dollars and cents).

import csv
import operator

class Jewel(object):
    @classmethod
    def fromSeq(cls, seq):
        return cls(*seq)

    def __init__(self, name, carat, value):
        self.name  = str(name)
        self.carat = float(carat)
        self.value = float(value)

    def __repr__(self):
        return "{0}{1}".format(self.__class__.__name__, (self.name, self.carat, self.value))

    @property
    def valuePerCarat(self):
        return self.value / self.carat

def loadJewels(fname):
    with open(fname, 'rb') as inf:
        incsv = csv.reader(inf)
        jewels = [Jewel.fromSeq(row) for row in incsv if row]
    jewels.sort(key=operator.attrgetter('valuePerCarat'))
    return jewels

def main():
    jewels = loadJewels('jewels.csv')
    for jewel in jewels:
        print("{0:35} ({1:>7.2f})".format(jewel, jewel.valuePerCarat))

if __name__=="__main__":
    main()

produces

Jewel('amethyst', 2.0, 50.0)        (  25.00)
Jewel('ruby', 2.0, 100.0)           (  50.00)
Jewel('malachite', 1.0, 60.0)       (  60.00)
Jewel('emerald', 3.0, 250.0)        (  83.33)
Jewel('sapphire', 2.0, 500.0)       ( 250.00)
Jewel('opal', 1.0, 300.0)           ( 300.00)
Jewel('diamond', 1.0, 400.0)        ( 400.00)    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM