在从CSV文件（Python）读取的多个列表中查找重复

Question

标题似乎令人困惑，但是可以说我正在使用以下CSV文件（“ names.csv”）。

    name1,name2,name3
    Bob,Jane,Joe
    Megan,Tom,Jane
    Jane,Joe,Rob

我的问题是，我将如何制作返回至少出现3次的字符串的代码。 因此输出应为“简”，因为这至少发生3次。 这里真的很困惑。也许一些示例代码可以帮助我更好地理解？

到目前为止，我有：

    import csv
    reader = csv.DictReader(open("names.csv"))

    for row in reader:
        names = [row['name1'], row['name2'], row['name3']]
        print names

返回：

    ['Bob', 'Jane', 'Joe']
    ['Megan', 'Tom', 'Jane']
    ['Jane', 'Joe', 'Rob']

我从这里去哪里？ 还是我要解决这个错误？ 我真的是Python的新手（嗯，完全是编程），所以我几乎不知道我在做什么。

干杯

Answer 1

我会这样：

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> rows = [['Bob', 'Jane', 'Joe'],
... ['Megan', 'Tom', 'Jane'],
... ['Jane', 'Joe', 'Rob']]
...
>>> for row in rows:
...     for name in row:
...         d[name] += 1
... 
>>> filter(lambda x: x[1] >= 3, d.iteritems())
[('Jane', 3)]

它使用默认值为0的dict来计算每个名称在文件中出现的次数，然后根据条件（计数> = 3）过滤dict。

Answer 2

放在一起（并显示正确的csv.reader用法）：

import csv
import collections
d = collections.defaultdict(int)
with open("names.csv", "rb") as f: # Python 3.x: use newline="" instead of "rb"
    reader = csv.reader(f):
    reader.next() # ignore useless heading row
    for row in reader:
        for name in row:
            name = name.strip()
            if name:
                d[name] += 1
 morethan3 = [(name, count) for name, count in d.iteritems() if count >= 3]
 morethan3.sort(key=lambda x: x[1], reverse=True)
 for name, count in morethan3:
    print name, count

更新以回应评论：

无论是否使用DictReader方法，都需要通读整个CSV文件。 例如，如果要忽略“ name2”列（ 而不是row ），则忽略它。 您不需要像使用变量名“ rows”那样保存所有数据。 这是一种更通用方法的代码，该方法不依赖于特定顺序的列标题，并且允许选择/拒绝特定列。

    reader = csv.DictReader(f):
    required_columns = ['name1', 'name3'] #### adjust this line as needed ####
    for row in reader:
        for col in required_columns:
            name = row[col].strip()
            if name:
                d[name] += 1

在从CSV文件（Python）读取的多个列表中查找重复

问题描述

2 个解决方案

解决方案1
0 2011-05-07 08:37:07

解决方案2
0 已采纳 2011-05-07 11:15:26

在从CSV文件（Python）读取的多个列表中查找重复

问题描述

2 个解决方案

解决方案1 0 2011-05-07 08:37:07

解决方案2 0 已采纳 2011-05-07 11:15:26

解决方案1
0 2011-05-07 08:37:07

解决方案2
0 已采纳 2011-05-07 11:15:26