简体   繁体   English

读取CSV文件并在Python中对其进行排序

[英]Reading in a CSV file AND sorting it in Python

I am trying to read in a CSV file that looks like this: 我试图读取一个看起来像这样的CSV文件:

ruby,2,100
diamond,1,400
emerald,3,250
amethyst,2,50
opal,1,300
sapphire,2,500
malachite,1,60

Here is some code I have been experimenting with. 这是我一直在尝试的一些代码。

class jewel:
    def __init__(gem, name, carat, value):
            gem.name = name
            gem.carot = carat
            gem.value = value
    def __repr__(gem):
            return repr((gem.name, gem.carat, gem.value))

jewel_objects = [jewel('diamond', '1', 400),
                 jewel('ruby', '2', 200),
                 jewel('opal', '1', 600),
                ]

aList = [sorted(jewel_objects, key=lambda jewel: (jewel.value))]
print aList

I would like to read in the values and assign them to name, carat, and value but I'm not sure how to do so. 我想读取值并将它们分配给name,carat和value,但我不知道该怎么做。 Then once I get them read in I would like to sort them by value per carat so value/carat. 然后,一旦我把它们读入,我想按每克拉的价值这样的价值/克拉对它们进行分类。 I have done quite a bit of searching and have came up blank. 我做了很多搜索,并且空白了。 Thank you very much for your help in advance. 非常感谢您的帮助。

You need to do two things here, the first is actually loading the data into the objects. 你需要在这里做两件事,第一件事实际上是将数据加载到对象中。 I recommend you look at the 'csv' module in the standard python library for this. 我建议您查看标准python库中的'csv'模块。 It's very complete and will read each row and make it easily accessable 它非常完整,将读取每一行并使其易于访问

CSV docs: http://docs.python.org/library/csv.html CSV文档: http//docs.python.org/library/csv.html

I would create a list of the objects, and then implement either an cmp function in your object, or (if you're using an older version of python) you can pass a function to sorted() that would define it. 我会创建一个对象列表,然后在你的对象中实现一个cmp函数,或者(如果你使用的是旧版本的python)你可以将一个函数传递给定义它的sorted()。 You can get more info about sorting in the python wiki 您可以在python wiki中获得有关排序的更多信息

Wiki docs: http://wiki.python.org/moin/HowTo/Sorting Wiki docs: http//wiki.python.org/moin/HowTo/Sorting

You would implement the cmp function like this in your class (this can be made a bit more efficent, but I'm being descriptive here) 你可以在你的类中实现这样的cmp函数(这可以更高效,但我在这里描述)

def __cmp__(gem, other):
    if (gem.value / gem.carot) < (other.value / other.carot):
        return -1
    elif (gem.value / gem.carot) > (other.value / other.carot): 
        return 1
    else:
        return 0

Python has a csv module that should be really helpful to you. Python有一个csv模块应该对你有用。

http://docs.python.org/library/csv.html http://docs.python.org/library/csv.html

You can use numpy structured arrays along with the csv module and use numpy.sort() to sort the data. 您可以将numpy结构化数组与csv模块一起使用,并使用numpy.sort()对数据进行排序。 The following code should work. 以下代码应该有效。 Suppose your csv file is named geminfo.csv 假设您的csv文件名为geminfo.csv

import numpy as np
import csv

fileobj = open('geminfo.csv','rb')
csvreader = csv.reader(fileobj)

# Convert data to a list of lists
importeddata = list(csvreader)

# Calculate Value/Carat and add it to the imported data
# and convert each entry to a tuple
importeddata = [tuple(entry + [float(entry[2])/entry[1]]) for entry in importeddata]

One way to sort this data is to use numpy as shown below. 对数据进行排序的一种方法是使用numpy,如下所示。

# create an empty array
data = np.zeros(len(importeddata), dtype = [('Stone Name','a20'),
                            ('Carats', 'f4'),
                            ('Value', 'f4'), 
                            ('valuepercarat', 'f4')]
                        )
data[:] = importeddata[:]
datasortedbyvaluepercarat = np.sort(data, order='valuepercarat')

For parsing real-world CSV (comma-separated values) data you'll want to use the CSV module that's included with recent versions of Python. 要解析实际CSV(逗号分隔值)数据,您需要使用最新版本的Python附带的CSV模块。

CSV is a set of conventions rather than standard. CSV是一组约定而不是标准。 The sample data you show is simple and regular, but CSV generally has some ugly corner cases for quoting where the contents of any field might have embedded commas, for example. 您显示的示例数据简单而有规律,但CSV通常有一些丑陋的角落案例,用于引用任何字段的内容可能嵌入逗号的位置。

Here is a very crude program, based on your code, which does naïve parsing of the data (splitting by lines, then splitting each line on commas). 这是一个非常粗略的程序,基于你的代码,它对数据进行天真的解析(按行分割,然后在逗号上分割每一行)。 It will not handle any data which doesn't split to precisely the correct number of fields, nor any where the numeric fields aren't correctly parsed by the Python int() and float() functions (object constructors). 它不会处理任何不能精确分割到正确数量的字段的数据,也不会处理Python int()float()函数(对象构造函数)未正确解析数字字段的任何数据。 In other words this contains no error checking nor exception handling. 换句话说,它不包含错误检查和异常处理。

However, I've kept it deliberately simple so it can be easily compared to your rough notes. 但是,我保持它故意简单,所以它可以很容易地与您的粗略笔记进行比较。 Also note that I've used the normal Python conventions regarding "self" references in the class definition. 另请注意,我在类定义中使用了关于“self”引用的常规Python约定。 (About the only time one would use names other than "self" for these is when doing "meta-class" programming ... writing classes which dynamically instantiate other classes. Any other case will almost certainly cause serious concerns in the minds of any experienced Python programmers looking at your code). (关于唯一一次使用除“self”之外的名称的人,就是在进行“元级”编程时...编写动态实例化其他类的类。任何其他情况几乎肯定会引起任何人心中的严重关注。经验丰富的Python程序员查看您的代码)。

#!/usr/bin/env python
class Jewel:
    def __init__(self, name, carat, value):
        self.name = name
        self.carat = int(carat)
        self.value = float(value)
        assert self.carat != 0      # Division by zero would result from this
    def __repr__(self):
        return repr((self.name, self.carat, self.value))

if __name__ == '__main__':
    sample='''ruby,2,100
diamond,1,400
emerald,3,250
amethyst,2,50
opal,1,300
sapphire,2,500
malachite,1,60'''

    these_jewels = list()
    for each_line in sample.split('\n'):
        gem_type, carat, value = each_line.split(',')
        these_jewels.append(Jewel(gem_type, carat, value))
        # Equivalently: 
        # these_jewels.append(Jewel(*each_line.split(',')))

    decorated = [(x.value/x.carat, x) for x in these_jewels]
    results = [x[1] for x in sorted(decorated)]
    print '\n'.join([str(x) for x in results])

The parsing here is done simply using the string .split() method, and the data is extracted into names using Python's "tuple unpacking" syntax (this would fail if any line of input were to have the wrong number of fields). 这里的解析只是使用字符串.split()方法完成,并使用Python的“元组解包”语法将数据提取到名称中(如果任何输入行具有错误的字段数,这将失败)。

The alternative syntax to those two lines uses Python's "apply" syntax. 这两行的替代语法使用Python的“apply”语法。 The * prefix on the argument causes it to be unpacked into separate arguments which are passed to the Jewel() class instantiation. 参数上的*前缀使其被解压缩为单独的参数,这些参数将传递给Jewel()类实例化。

This code also uses the widespread (and widely recommended) DSU (decorate, sort, undecorate) pattern for sorting on some field of your data. 此代码还使用广泛(且广泛推荐)的DSU(装饰,排序,未装饰)模式对数据的某些字段进行排序。 I "decorate" the data by creating a series of tuples: (computed value, object reference), then "undecorate" the sorted data in a way which I hope is clear to you. 我通过创建一系列元组来“装饰”数据:(计算值,对象引用),然后以我希望对你清楚的方式“解开”排序数据。 (It would be immediately clear to any experienced Python programmer). (任何有经验的Python程序员都会立即明白)。

Yes the whole DSU could be reduced to a single line; 是的,整个DSU可以减少到一条线; I've separated it here for legibility and pedagogical purposes. 我把它分开是为了易读性和教学目的。

Again this sample code is purely for your edification. 此示例代码再次纯粹是为了您的启发。 You should use the CSV module on any real-world data; 您应该在任何真实数据上使用CSV模块; and you should introduce exception handling either in the parsing or in the Jewel.__init__ handling (for converting the numeric data into the correct Python types. (Also note that you should consider using Python's Decimal module rather than float() s for representing monetary values ... or at least storing the values in cents or mils and using your own functions to represent those as dollars and cents). 你应该在解析或者Jewel.__init__处理中引入异常处理(用于将数值数据转换成正确的Python类型。)另外请注意,你应该考虑使用Python的Decimal模块而不是float()来表示货币值......或者至少以美分或密码存储值,并使用您自己的函数将这些值表示为美元和美分。

import csv
import operator

class Jewel(object):
    @classmethod
    def fromSeq(cls, seq):
        return cls(*seq)

    def __init__(self, name, carat, value):
        self.name  = str(name)
        self.carat = float(carat)
        self.value = float(value)

    def __repr__(self):
        return "{0}{1}".format(self.__class__.__name__, (self.name, self.carat, self.value))

    @property
    def valuePerCarat(self):
        return self.value / self.carat

def loadJewels(fname):
    with open(fname, 'rb') as inf:
        incsv = csv.reader(inf)
        jewels = [Jewel.fromSeq(row) for row in incsv if row]
    jewels.sort(key=operator.attrgetter('valuePerCarat'))
    return jewels

def main():
    jewels = loadJewels('jewels.csv')
    for jewel in jewels:
        print("{0:35} ({1:>7.2f})".format(jewel, jewel.valuePerCarat))

if __name__=="__main__":
    main()

produces 产生

Jewel('amethyst', 2.0, 50.0)        (  25.00)
Jewel('ruby', 2.0, 100.0)           (  50.00)
Jewel('malachite', 1.0, 60.0)       (  60.00)
Jewel('emerald', 3.0, 250.0)        (  83.33)
Jewel('sapphire', 2.0, 500.0)       ( 250.00)
Jewel('opal', 1.0, 300.0)           ( 300.00)
Jewel('diamond', 1.0, 400.0)        ( 400.00)    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM