简体   繁体   English

从Java中的ArrayList中删除“ regex重复项”

[英]Remove “regex duplicates” from ArrayList in java

I want to "clean" an ArrayList in java, here is the explanation 我想在Java中“清理” ArrayList,这是解释

Assuming we have this list : 假设我们有以下列表:

a = ["a_12_b", "a_13_b", "a_13bis_b", "a_14_b", "a_14_new_b"]

In this list, "a_13bis_b" and "a_14_new_b" are considered as duplicates, Why ? 在此列表中, "a_13bis_b""a_14_new_b"被视为重复项,为什么? because each entry have this regex : a_ "a string with a lenght =2" _b 因为每个条目都具有此正则表达式: a_ "a string with a lenght =2" _b

The output should be : 输出应为:

a = ["a_12_b", "a_13_b", "a_14_b"]

I used this simple code, but it returns wrong output : 我使用了以下简单代码,但返回错误输出:

for (int j = 0; j < list.size(); j++) {
            //basically clean entry will remove the a_ and _b
            String value1= cleanEntry(list.get(j));
            for (int k = 0; k < list.size(); k++) {
                    String value2= cleanEntry(list.get(k));
                    if (k != j && value1.equalsIgnoreCase(value2)) {
                        duplicates.add(list.get(k))
                        list.remove(k);
                    }
            }
}

Any help ? 有什么帮助吗?

You could use the stream map method with a regular expression to "normalize" the strings to a common format and then create a set out of the normalized strings. 您可以使用带有正则表达式的流映射方法将字符串“规范化”为通用格式,然后从规范化的字符串中创建一个集合。

Something like this: 像这样:

List<String> a = Arrays.asList("a_12_b", "a_13_b", "a_13bis_b", "a_14_b", "a_14_new_b");
Set<String> uniques = a.stream()
                .map(s -> s.replaceAll("^([a-z]_\\d{2})[^\\d].+(_[a-z])$", "$1$2"))
                .collect(Collectors.toSet());
System.out.println(uniques);

This prints: 打印:

[a_14_b, a_13_b, a_12_b] [a_14_b,a_13_b,a_12_b]

Solution for Java 7, 6: Java 7、6的解决方案:

List<String> a = Arrays.asList("a_12_b", "a_13_b", "a_13bis_b", "a_14_b", "a_14_new_b");
Set<String> set = new LinkedHashSet<>();
for(String s : a) {
    set.add(s.replaceAll("^([a-z]_\\d{2})[^\\d].+(_[a-z])$", "$1$2"));
}
System.out.println(set);

Result: 结果:

[a_12_b, a_13_b, a_14_b] [a_12_b,a_13_b,a_14_b]

If you need more than 2 numeric characters, you can change the regular expression. 如果需要两个以上的数字字符,则可以更改正则表达式。 Here is an example with result: 这是结果示例:

List<String> a = Arrays.asList("a_12345678901234567890123456_b", "a_13345678901234567890123456_b",
                "a_13345678901234567890123456bis_b", "a_14345678901234567890123456_b", "a_14345678901234567890123456_new_b");
Set<String> set = new LinkedHashSet<>();
for(String s : a) {
    set.add(s.replaceAll("^([a-z]_\\d{26})[^\\d].+(_[a-z])$", "$1$2"));
}
System.out.println(set);

Result: 结果:

[a_12345678901234567890123456_b, a_13345678901234567890123456_b, a_14345678901234567890123456_b] [a_12345678901234567890123456_b,a_13345678901234567890123456_b,a_14345678901234567890123456_b,

You can simply discard all the characters after 2nd character before comparison. 您可以简单地丢弃比较之前第二个字符之后的所有字符。 Try this.. 尝试这个..

for (int j = 0; j < list.size(); j++) {
    //basically clean entry will remove the a_ and _b
    String value1= cleanEntry(list.get(j));
    for (int k = 0; k < list.size(); k++) {
        String value2= cleanEntry(list.get(k));
        if (k != j && value1.substring(0,2).equalsIgnoreCase(value2.substring(0,2))) {
            duplicates.add(list.get(k)) list.remove(k);
        }
    } 
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM