[英]Python extract unique values from CSV
我正在使用以下python腳本從CSV文件中刪除重復項
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
if line in seen: continue # skip duplicate
seen.add(line)
out_file.write(line)
我正在嘗試對其進行修改,以便與其將沒有重復的列表輸出到final.csv,而是輸出找到的唯一值。
與現在的做法有點相反。 有人舉個例子嗎?
使用dict跟蹤每行出現的次數,然后可以處理dict並將僅唯一項添加到seen
集合中,然后將其寫入final.csv
:
from collections import defaultdict
uniques = defaultdict(int)
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
uniques[line] +=1
for k, v in uniques.iteritems():
if v = 1:
seen.add(k)
out_file.write(k)
要么:
from collections import defaultdict
uniques = defaultdict(int)
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
for line in in_file:
uniques[line] +=1
seen = set(k for k in uniques if uniques[k] == 1)
for itm in seen:
out_file.write(itm)
或者,使用Counter
:
from collections import Counter
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
seen = set() # set for fast O(1) amortized lookup
lines = Counter(file.readlines())
seen = set(k for k in lines if lines[k] == 1)
for itm in seen:
out_file.write(itm)
這將僅輸出僅出現一次的行,具體取決於您所說的“唯一性”是什么,這可能是正確的,也可能是不正確的。 相反,如果要使用最后一種方法輸出所有行,但每行僅輸出一個實例:
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
lines = Counter(file.readlines())
for itm in lines:
out_file.write(itm)
您可以將重復變量收集到另一個變量中,並使用它們從集合中刪除非唯一值。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.