繁体   English   中英

Python从CSV提取唯一值

[英]Python extract unique values from CSV

我正在使用以下python脚本从CSV文件中删除重复项

with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        if line in seen: continue # skip duplicate

        seen.add(line)
        out_file.write(line)

我正在尝试对其进行修改,以便与其将没有重复的列表输出到final.csv,而是输出找到的唯一值。

与现在的做法有点相反。 有人举个例子吗?

使用dict跟踪每行出现的次数,然后可以处理dict并将仅唯一项添加到seen集合中,然后将其写入final.csv

from collections import defaultdict
uniques = defaultdict(int)
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        uniques[line] +=1
    for k, v in uniques.iteritems():
        if v = 1:
            seen.add(k)
            out_file.write(k)

要么:

from collections import defaultdict
uniques = defaultdict(int)
with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    for line in in_file:
        uniques[line] +=1

    seen = set(k for k in uniques if uniques[k] == 1)
    for itm in seen:
        out_file.write(itm)

或者,使用Counter

from collections import Counter

with open('test.csv','r') as in_file, open('final.csv','w') as out_file:
    seen = set() # set for fast O(1) amortized lookup
    lines = Counter(file.readlines())
    seen = set(k for k in lines if lines[k] == 1)
    for itm in seen:
        out_file.write(itm)

这将输出出现一次的行,具体取决于您所说的“唯一性”是什么,这可能是正确的,也可能是不正确的。 相反,如果要使用最后一种方法输出所有行,但每行仅输出一个实例:

with open('test.csv','r') as in_file, open('final.csv','w') as out_file:

    lines = Counter(file.readlines())

    for itm in lines:
        out_file.write(itm)

您可以将重复变量收集到另一个变量中,并使用它们从集合中删除非唯一值。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM