簡體 English 中英

從已排序的超大文件（每個200G）列表中刪除重復項的最佳方法？

[英]The optimal way to remove duplicates from a list of sorted very large files (200G each)?

原文 2014-12-08 09:19:58 2 1 python/ linux/ large-files/ duplicate-removal

其他先前提出的問題沒有回答我的問題！

我每個都有一系列大文件（200 G），每個文件都經過排序，並包含如下所示的重復項：

 50.21.180.100|a.ac
 50.21.180.100|a.ac
 50.21.180.100|a.ac
 50.21.180.100|a.ac
 50.21.180.100|a.ac
 50.21.180.100| b.ac
 50.21.180.100| b.ac
 50.21.180.100|b.ac
 50.21.180.100|b.ac
 50.21.180.100|b.ac
 50.21.180.100| c.ac
 50.21.180.100| c.ac
 50.21.180.100|c.ac
 50.21.180.100|c.ac
 50.21.180.100|c.ac
 50.21.180.100|c.ac
 50.21.180.100| d.ac

預期產量：

50.21.180.100|a.ac
50.21.180.100|b.ac
50.21.180.100|c.ac
50.21.180.100|d.ac

是否有任何機構建議刪除這些重復項的最佳方法（在時間和記憶方面）？ 是Linux bash還是Python或其他語言？

1 個解決方案

首先刪除空間，然后運行uniq：

cat infile.txt | tr -d " " | uniq > outfile.txt

在 Python 中從非常大的文本文件中刪除重復項的更快方法？

[英]Faster way to remove duplicates from a very large text file in Python?

從非常大（密碼）列表聚合和刪除重復項的有效方法

[英]Efficient way to aggregate and remove duplicates from very large (password) lists

將元素存儲在排序列表中的最佳方式

[英]Optimal way to store an element in a sorted list

從排序數組中刪除重復項

[英]Remove Duplicates from Sorted Array

從大列表中刪除重復項，但如果確實存在則刪除兩者？

[英]Remove duplicates from large list but remove both if it does exist?

根據每個列表的子集從列表列表中刪除重復項

[英]Remove duplicates from a list of list based on a subset of each list

使用每個條目實例變量刪除列表中重復項的最快方法

[英]Fastest way to remove duplicates in a list using each entries instance variables

Python：高效，優雅地從大型列表中刪除所有重復項

[英]Python: Remove all duplicates from a large list of lists efficiently and elegantly

嘗試從大量對象中刪除重復項，並保留某些對象

[英]Trying to remove duplicates from large list of objects, keep certain one

如何有效地從 Python 的大列表中刪除重復項？

[英]How can I efficiently remove duplicates from a large list in Python?

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 在 Python 中從非常大的文本文件中刪除重復項的更快方法？從非常大（密碼）列表聚合和刪除重復項的有效方法將元素存儲在排序列表中的最佳方式從排序數組中刪除重復項從大列表中刪除重復項，但如果確實存在則刪除兩者？根據每個列表的子集從列表列表中刪除重復項使用每個條目實例變量刪除列表中重復項的最快方法 Python：高效，優雅地從大型列表中刪除所有重復項嘗試從大量對象中刪除重復項，並保留某些對象如何有效地從 Python 的大列表中刪除重復項？

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM