简体   繁体   English

在Python中使用过滤器功能

[英]Using the filter function in Python

I am trying to use Python's built-in filter function in order to extract data from certain columns in a CSV. 我试图使用Python的内置过滤器功能从CSV的某些列中提取数据。 Is this a good use of the filter function? 这是过滤功能的好用法吗? Would I have to define the data in these columns first, or would Python somehow already know which columns contain what data? 我必须先在这些列中定义数据,还是Python以某种方式已经知道哪些列包含哪些数据?

Since python boasted "batteries included", for most the everyday situations, someone might already provided a solution. 由于python吹嘘“包括电池”,因此对于大多数日常情况,有人可能已经提供了解决方案。 CSV is one of them, there is built-in csv module CSV是其中之一, 内置csv模块

Also tablib is a very good 3rd-party module especially you're dealing with non-ascii data. tablib也是一个非常好的第三方模块,尤其是在处理非ASCII数据时。

For the behaviour you described in the comment, this will do: 对于您在评论中描述的行为,它将执行以下操作:

import csv
with open('some.csv', 'rb') as f:
   reader = csv.reader(f)
   for row in reader:
      row.pop(1)
      print ", ".join(row)

The filter function is intended to select from a list (or in general, any iterable) those elements which satisfy a certain condition. filter功能旨在从列表(或一般而言,任何可迭代的)中选择满足特定条件的那些元素。 It's not really intended for index-based selection. 它并不是真正针对基于索引的选择。 So although you could use it to pick out specified columns of a CSV file, I wouldn't recommend it. 因此,尽管您可以使用它来挑选CSV文件的指定列,但我不建议这样做。 Instead you should probably use something like this: 相反,您可能应该使用如下所示的内容:

with open(filename, 'rb') as f:
    for record in csv.reader(f):
        do_something_with(record[0], record[2])

Depending on what exactly you are doing with the records, it may be better to create an iterator over the columns of interest: 根据您对记录的确切操作,最好在感兴趣的列上创建一个迭代器:

with open(filename, 'rb') as f:
    the_iterator = ((record[0], record[2]) for record in csv.reader(f))
    # do something with the iterator

or, if you need non-sequential processing, perhaps a list: 或者,如果您需要非顺序处理,则可以列出:

with open(filename, 'rb') as f:
    the_list = [(record[0], record[2]) for record in csv.reader(f)]
    # do something with the list

I'm not sure what you mean by defining the data in the columns. 我不确定在列中定义数据是什么意思。 The data are defined by the CSV file. 数据由CSV文件定义。


By comparison, here's a case in which you would want to use filter : suppose your CSV file contains numeric data, and you need to build a list of the records in which the numbers are in strictly increasing order within the row. 相比之下,在这种情况下,您需要使用filter :假设您的CSV文件包含数字数据,并且您需要构建一个记录列表,其中该行中的数字严格按升序排列。 You could write a function to determine whether a list of numbers is in strictly increasing order: 您可以编写一个函数来确定数字列表是否严格按照升序排列:

def strictly_increasing(fields):
    return all(int(i) < int(j) for i,j in pairwise(fields))

(see the itertools documentation for a definition of pairwise ). (见itertools文档对的定义pairwise )。 Then you can use this as the condition in filter : 然后,您可以将其用作filter的条件:

with open(filename, 'rb') as f:
    the_list = filter(strictly_increasing, csv.reader(f))
    # do something with the list

Of course, the same thing could, and usually would, be implemented as a list comprehension: 当然,同一件事可以并且通常将被实现为列表理解:

with open(filename, 'rb') as f:
    the_list = [record for record in csv.reader(f) if strictly_increasing(record)]
    # do something with the list

so there's little reason to use filter in practice. 因此几乎没有理由在实践中使用filter

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM