繁体 English 中英

在Python中有效地从CSV中删除重复项

[英]Effeciently Removing Duplicates from a CSV in Python

原文 2011-07-28 18:21:04 9 2 python/ csv/ performance

我试图有效地从没有以任何有意义的方式排序的相对较大（数百MB）的CSV文件中删除重复的行。 尽管我有这样做的技巧，但它是蛮力的，我敢肯定，这是一种优雅而有效的方式。

2 个解决方案

为了删除重复项，您将必须具有某种内存来告诉您以前是否看到过一行。 要么记住这些行，要么记住它们的校验和（这几乎是安全的……），这样的任何解决方案都可能会对它产生“蛮力”的感觉。

如果您可以在处理之前对行进行排序，那么任务就很容易了，因为重复项会彼此相邻。

以下假设您从CSV获得的行最终以列表列表的形式出现。 然后，您必须决定要在什么基础上进行重复数据删除（即哪一列），在下面的示例中，它是第一列（ x[0] ）

def dedup(seq):
""" De-duplicate a list based on the first member of the sublist
"""
seen = set()
seen_add = seen.add
return [x for x in seq if
    x[0] not in seen
    and not seen_add(x[0])]

python 2.6-有效地删除和计算字典列表中的重复项

[英]python 2.6-removing and counting duplicates in a list of dictionaries effeciently

Python：从巨大的csv文件中删除重复项（内存问题）

[英]Python: Removing duplicates from a huge csv file (memory issues)

从列表python中删除重复项

[英]removing duplicates from a list python

Python-从字符串中删除重复项

[英]Python - Removing duplicates from a string

从 python 数组中删除重复项

[英]Removing duplicates from python array

从 Python 中的列表中删除重复项

[英]Removing duplicates from list in Python

Python：从列表中删除重复项

[英]Python: Removing duplicates from a list

Python，从列表中删除重复项

[英]Python, removing duplicates from a list

从大型 csv 文件中删除重复项

[英]Removing duplicates from a large csv file

从 csv 词频列表中删除重复项

[英]Removing duplicates from a csv word frequency list

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 python 2.6-有效地删除和计算字典列表中的重复项 Python：从巨大的csv文件中删除重复项（内存问题）从列表python中删除重复项 Python-从字符串中删除重复项从 python 数组中删除重复项从 Python 中的列表中删除重复项 Python：从列表中删除重复项 Python，从列表中删除重复项从大型 csv 文件中删除重复项从 csv 词频列表中删除重复项

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM