从Java中的ArrayList中删除“ regex重复项”

Question

I want to "clean" an ArrayList in java, here is the explanation 我想在Java中“清理” ArrayList，这是解释

Assuming we have this list : 假设我们有以下列表：

a = ["a_12_b", "a_13_b", "a_13bis_b", "a_14_b", "a_14_new_b"]

In this list, "a_13bis_b" and "a_14_new_b" are considered as duplicates, Why ? 在此列表中， "a_13bis_b"和"a_14_new_b"被视为重复项，为什么？ because each entry have this regex : a_ "a string with a lenght =2" _b 因为每个条目都具有此正则表达式： a_ "a string with a lenght =2" _b

The output should be : 输出应为：

a = ["a_12_b", "a_13_b", "a_14_b"]

I used this simple code, but it returns wrong output : 我使用了以下简单代码，但返回错误输出：

for (int j = 0; j < list.size(); j++) {
            //basically clean entry will remove the a_ and _b
            String value1= cleanEntry(list.get(j));
            for (int k = 0; k < list.size(); k++) {
                    String value2= cleanEntry(list.get(k));
                    if (k != j && value1.equalsIgnoreCase(value2)) {
                        duplicates.add(list.get(k))
                        list.remove(k);
                    }
            }
}

Any help ? 有什么帮助吗？

Answer 1

You could use the stream map method with a regular expression to "normalize" the strings to a common format and then create a set out of the normalized strings. 您可以使用带有正则表达式的流映射方法将字符串“规范化”为通用格式，然后从规范化的字符串中创建一个集合。

Something like this: 像这样：

List<String> a = Arrays.asList("a_12_b", "a_13_b", "a_13bis_b", "a_14_b", "a_14_new_b");
Set<String> uniques = a.stream()
                .map(s -> s.replaceAll("^([a-z]_\\d{2})[^\\d].+(_[a-z])$", "$1$2"))
                .collect(Collectors.toSet());
System.out.println(uniques);

This prints: 打印：

[a_14_b, a_13_b, a_12_b] [a_14_b，a_13_b，a_12_b]

Solution for Java 7, 6: Java 7、6的解决方案：

List<String> a = Arrays.asList("a_12_b", "a_13_b", "a_13bis_b", "a_14_b", "a_14_new_b");
Set<String> set = new LinkedHashSet<>();
for(String s : a) {
    set.add(s.replaceAll("^([a-z]_\\d{2})[^\\d].+(_[a-z])$", "$1$2"));
}
System.out.println(set);

Result: 结果：

[a_12_b, a_13_b, a_14_b] [a_12_b，a_13_b，a_14_b]

If you need more than 2 numeric characters, you can change the regular expression. 如果需要两个以上的数字字符，则可以更改正则表达式。 Here is an example with result: 这是结果示例：

List<String> a = Arrays.asList("a_12345678901234567890123456_b", "a_13345678901234567890123456_b",
                "a_13345678901234567890123456bis_b", "a_14345678901234567890123456_b", "a_14345678901234567890123456_new_b");
Set<String> set = new LinkedHashSet<>();
for(String s : a) {
    set.add(s.replaceAll("^([a-z]_\\d{26})[^\\d].+(_[a-z])$", "$1$2"));
}
System.out.println(set);

Result: 结果：

[a_12345678901234567890123456_b, a_13345678901234567890123456_b, a_14345678901234567890123456_b] [a_12345678901234567890123456_b，a_13345678901234567890123456_b，a_14345678901234567890123456_b，

Answer 2

You can simply discard all the characters after 2nd character before comparison. 您可以简单地丢弃比较之前第二个字符之后的所有字符。 Try this.. 尝试这个..

for (int j = 0; j < list.size(); j++) {
    //basically clean entry will remove the a_ and _b
    String value1= cleanEntry(list.get(j));
    for (int k = 0; k < list.size(); k++) {
        String value2= cleanEntry(list.get(k));
        if (k != j && value1.substring(0,2).equalsIgnoreCase(value2.substring(0,2))) {
            duplicates.add(list.get(k)) list.remove(k);
        }
    } 
}

从Java中的ArrayList中删除“ regex重复项”

问题描述

2 个解决方案

解决方案1
1 2017-06-28 14:53:47

解决方案2
0 2017-06-28 13:58:51

从Java中的ArrayList中删除“ regex重复项”

问题描述

2 个解决方案

解决方案1 1 2017-06-28 14:53:47

解决方案2 0 2017-06-28 13:58:51

解决方案1
1 2017-06-28 14:53:47

解决方案2
0 2017-06-28 13:58:51