在Python中使用過濾器功能

Question

我試圖使用Python的內置過濾器功能從CSV的某些列中提取數據。 這是過濾功能的好用法嗎？ 我必須先在這些列中定義數據，還是Python以某種方式已經知道哪些列包含哪些數據？

Answer 1

由於python吹噓“包括電池”，因此對於大多數日常情況，有人可能已經提供了解決方案。 CSV是其中之一，內置csv模塊

tablib也是一個非常好的第三方模塊，尤其是在處理非ASCII數據時。

對於您在評論中描述的行為，它將執行以下操作：

import csv
with open('some.csv', 'rb') as f:
   reader = csv.reader(f)
   for row in reader:
      row.pop(1)
      print ", ".join(row)

Answer 2

filter功能旨在從列表（或一般而言，任何可迭代的）中選擇滿足特定條件的那些元素。 它並不是真正針對基於索引的選擇。 因此，盡管您可以使用它來挑選CSV文件的指定列，但我不建議這樣做。 相反，您可能應該使用如下所示的內容：

with open(filename, 'rb') as f:
    for record in csv.reader(f):
        do_something_with(record[0], record[2])

根據您對記錄的確切操作，最好在感興趣的列上創建一個迭代器：

with open(filename, 'rb') as f:
    the_iterator = ((record[0], record[2]) for record in csv.reader(f))
    # do something with the iterator

或者，如果您需要非順序處理，則可以列出：

with open(filename, 'rb') as f:
    the_list = [(record[0], record[2]) for record in csv.reader(f)]
    # do something with the list

我不確定在列中定義數據是什么意思。 數據由CSV文件定義。

相比之下，在這種情況下，您需要使用filter ：假設您的CSV文件包含數字數據，並且您需要構建一個記錄列表，其中該行中的數字嚴格按升序排列。 您可以編寫一個函數來確定數字列表是否嚴格按照升序排列：

def strictly_increasing(fields):
    return all(int(i) < int(j) for i,j in pairwise(fields))

（見itertools文檔對的定義pairwise ）。 然后，您可以將其用作filter的條件：

with open(filename, 'rb') as f:
    the_list = filter(strictly_increasing, csv.reader(f))
    # do something with the list

當然，同一件事可以並且通常將被實現為列表理解：

with open(filename, 'rb') as f:
    the_list = [record for record in csv.reader(f) if strictly_increasing(record)]
    # do something with the list

因此幾乎沒有理由在實踐中使用filter 。