在Python中使用过滤器功能

Question

我试图使用Python的内置过滤器功能从CSV的某些列中提取数据。 这是过滤功能的好用法吗？ 我必须先在这些列中定义数据，还是Python以某种方式已经知道哪些列包含哪些数据？

Answer 1

由于python吹嘘“包括电池”，因此对于大多数日常情况，有人可能已经提供了解决方案。 CSV是其中之一，内置csv模块

tablib也是一个非常好的第三方模块，尤其是在处理非ASCII数据时。

对于您在评论中描述的行为，它将执行以下操作：

import csv
with open('some.csv', 'rb') as f:
   reader = csv.reader(f)
   for row in reader:
      row.pop(1)
      print ", ".join(row)

Answer 2

filter功能旨在从列表（或一般而言，任何可迭代的）中选择满足特定条件的那些元素。 它并不是真正针对基于索引的选择。 因此，尽管您可以使用它来挑选CSV文件的指定列，但我不建议这样做。 相反，您可能应该使用如下所示的内容：

with open(filename, 'rb') as f:
    for record in csv.reader(f):
        do_something_with(record[0], record[2])

根据您对记录的确切操作，最好在感兴趣的列上创建一个迭代器：

with open(filename, 'rb') as f:
    the_iterator = ((record[0], record[2]) for record in csv.reader(f))
    # do something with the iterator

或者，如果您需要非顺序处理，则可以列出：

with open(filename, 'rb') as f:
    the_list = [(record[0], record[2]) for record in csv.reader(f)]
    # do something with the list

我不确定在列中定义数据是什么意思。 数据由CSV文件定义。

相比之下，在这种情况下，您需要使用filter ：假设您的CSV文件包含数字数据，并且您需要构建一个记录列表，其中该行中的数字严格按升序排列。 您可以编写一个函数来确定数字列表是否严格按照升序排列：

def strictly_increasing(fields):
    return all(int(i) < int(j) for i,j in pairwise(fields))

（见itertools文档对的定义pairwise ）。 然后，您可以将其用作filter的条件：

with open(filename, 'rb') as f:
    the_list = filter(strictly_increasing, csv.reader(f))
    # do something with the list

当然，同一件事可以并且通常将被实现为列表理解：

with open(filename, 'rb') as f:
    the_list = [record for record in csv.reader(f) if strictly_increasing(record)]
    # do something with the list

因此几乎没有理由在实践中使用filter 。