Python從CSV提取唯一值

Question

我正在使用以下python腳本從CSV文件中刪除重復項

with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        if line in seen: continue # skip duplicate

        seen.add(line)
        out_file.write(line)

我正在嘗試對其進行修改，以便與其將沒有重復的列表輸出到final.csv，而是輸出找到的唯一值。

與現在的做法有點相反。 有人舉個例子嗎？

Answer 1

使用dict跟蹤每行出現的次數，然后可以處理dict並將僅唯一項添加到seen集合中，然后將其寫入final.csv ：

from collections import defaultdict
uniques = defaultdict(int)
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        uniques[line] +=1
    for k, v in uniques.iteritems():
        if v = 1:
            seen.add(k)
            out_file.write(k)

要么：

from collections import defaultdict
uniques = defaultdict(int)
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        uniques[line] +=1

    seen = set(k for k in uniques if uniques[k] == 1)
    for itm in seen:
        out_file.write(itm)

或者，使用Counter ：

from collections import Counter

with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    lines = Counter(file.readlines())
    seen = set(k for k in lines if lines[k] == 1)
    for itm in seen:
        out_file.write(itm)

這將僅輸出僅出現一次的行，具體取決於您所說的“唯一性”是什么，這可能是正確的，也可能是不正確的。 相反，如果要使用最后一種方法輸出所有行，但每行僅輸出一個實例：

with open('test.csv','r') as in_file, open('final.csv','w') as out_file:

    lines = Counter(file.readlines())

    for itm in lines:
        out_file.write(itm)

Answer 2

您可以將重復變量收集到另一個變量中，並使用它們從集合中刪除非唯一值。

Python從CSV提取唯一值

問題描述

2 個解決方案

解決方案1
2 已采納 2016-03-30 14:23:02

解決方案2
0 2016-03-30 14:23:07

Python從CSV提取唯一值

問題描述

2 個解決方案

解決方案1 2 已采納 2016-03-30 14:23:02

解決方案2 0 2016-03-30 14:23:07

解決方案1
2 已采納 2016-03-30 14:23:02

解決方案2
0 2016-03-30 14:23:07