简体   繁体   English

在列表/文件中查找重复项。 [Groovy/Java]

[英]Finding duplicates in a list/file. [Groovy/Java]

I have an input file where each line is a special record.我有一个输入文件,其中每一行都是一个特殊记录。 I would gladly work on the file level but might be a more convenient way to transfer the file into a list.我很乐意在文件级别上工作,但可能是将文件传输到列表中的更方便的方法。 (each object in the list = each row in the file) In the input file, there can be several duplicate rows. (列表中的每个对象 = 文件中的每一行)在输入文件中,可以有多个重复的行。 The goal: Split the given file/list into unique records and duplicate records, ie, Records which are present multiple times, keep one occurrence and other duplicate parts store in a new list I found an easy way how to remove duplicates but never found a way how to store them目标:将给定的文件/列表拆分为唯一记录和重复记录,即多次出现的记录,将一个事件和其他重复部分存储在新列表中我找到了一种如何删除重复项但从未找到的简单方法如何存储它们

File inputFile = new File("....")
inputFile.eachLine {    inputList.add(it)   } //fill the list
List inputList = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
inputList = inputList.unique() // remove duplicates
println inputList
// inputList = [1, 3, 2, 4, 5, 6, 7, 8, 9, 10]

The output should look like: Two lists/files with removed duplicates and duplicates itself输出应如下所示: 两个列表/文件已删除重复项并自我复制

inputList = [1,3,2,4,5,6,7,8,9,10] //only one ocurance of each line
listOfDuplicates = [1,1,1,3,3,2,7,8] //duplicates removed from original list

The output does not need to correspond with the initial order of items.输出不需要与项目的初始顺序相对应。 Thank you for help, Matt谢谢你的帮助,马特

You could simply iterate over the list yourself:您可以自己简单地遍历列表:

def inputList = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]

def uniques = []
def duplicates = []

inputList.each { uniques.contains(it) ? duplicates << it : uniques << it }

assert inputList.size() == uniques.size() + duplicates.size()
assert uniques == [1,3,2,4,5,6,7,8,9,10] //only one ocurance of each line
assert duplicates == [1,3,1,2,3,1,7,8] //duplicates removed from original list

inputList = uniques // if desired

There are many ways to do this,following is the simplest way有很多方法可以做到这一点,以下是最简单的方法

def list = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
        def unique=[]
        def duplicates=[]
        list.each {
            if(unique.contains(it))
                duplicates.add(it)
            else
                unique.add(it)

        }
        println list //[1, 1, 3, 3, 1, 2, 2, 3, 4, 1, 5, 6, 7, 7, 8, 9, 8, 10]
        println unique //[1, 3, 2, 4, 5, 6, 7, 8, 9, 10]
        println duplicates //[1, 3, 1, 2, 3, 1, 7, 8]

Hope this will helps you希望这会帮助你

Something very straight-forward:一些非常直接的事情:

List inputList = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10] 
def uniques = [], duplicates = []

Iterator iter = inputList.iterator()
iter.each{
  iter.remove()
  inputList.contains( it ) ? ( duplicates << it ) : ( uniques << it )
}

assert [2, 3, 4, 1, 5, 6, 7, 9, 8, 10] == uniques
assert [1,1,3,3,1,2,7,8] == duplicates

This code should solve the problem这段代码应该可以解决问题

 List listOfDuplicates = inputList.clone()
 listOfDuplicates.removeAll{
    listOfDuplicates.count(it) == 1
 }

If order of duplicates isn't important:如果重复的顺序不重要:

def list = [1,1,3,3,1,2,2,3,4,1,5,6,7,7,8,9,8,10]
def (unique, dups) = list.groupBy().values()*.with{ [it[0..0], tail()] }.transpose()*.sum()
assert unique == [1,3,2,4,5,6,7,8,9,10]
assert dups == [1,1,1,3,3,2,7,8]

The more the merrier:多多益善:

groovy:000> list.groupBy().values()*.tail().flatten()
===> [1, 1, 1, 3, 3, 2, 7, 8]
  1. Group by identity (this is basically a "frequencies" function).按身份分组(这基本上是一个“频率”功能)。
  2. Take just the values只取值
  3. Clip the first element剪辑第一个元素
  4. Combine the lists合并列表

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM