从 String[] 中删除重复项

Question

我创建了一个 webscraper 来拉链接，我时不时地收到重复的条目，我将它们存储在一个 String[] 数组中，我已经抛出了几个循环来检查重复但未能删除重复项。

    public static void checkForDupes(String[] links) {

    for (int i = 0; i<links.length; i++) {
        String check = links[i];

        for (String link : links) {

            if (check.equals(link)) {
                // remove link
            } 
        }       
    }

}

Answer 1

简单数组对位置固定的对象有固定数量的引用。 因此，您不能删除对象，只能用其他对象或 null 替换它们，这会在数组中间留下整体。 这可能不是您想要的。

由于您无法从数组中删除元素，因此您需要将整个数组替换为一个仅包含所需条目的新数组。 但这产生了一个新问题：开始时您必须指定目标数组的大小，但您不知道最终需要哪个大小。 因此，您要么需要扩大数组并将实际使用的长度存储在另一个变量中，请使用LinkedList ，它在添加元素时支持可变大小和良好的性能。

此外，此类嵌套循环在较大时往往会变得非常慢。 当您有 20 个以上的条目时，在HashSet收集值比这种简单的 for 循环要快得多，并且作为副作用已经消除了重复项。

一个非常相似的类是HashMap ，它不会消除重复项。 你应该谷歌解释散列算法和散列映射是如何工作的。 这是一个非常有趣的话题。

例子：

import java.util.Arrays;
import java.util.HashSet;

public class Main
{
    public static void main(String[] args) throws Exception
    {

        String[] links = {"a","b","a","c","b","c","d","e","f"};

        HashSet<String> set=new HashSet<>();
        set.addAll(Arrays.asList(links));

        System.out.println(set);
    }
}

输出：

[a, b, c, d, e, f]

要从集合中读取单个元素，您可以使用set.get(index) 。

以下示例显示了如何在没有 HashSet 的情况下实现相同的目标：

import java.util.Arrays;
import java.util.HashSet;

public class Main
{
    public static void main(String[] args) throws Exception
    {

        String[] input = {"a", "b", "a", "c", "b", "c", "d", "e", "f"};

        String[] output = new String[input.length];
        int count = 0;

        // Iterate over the input array
        for (String in : input)
        {
            // Check if the string is already in the output array
            boolean found=false;
            for (String out : output)
            {
                if (in.equals(out))
                {
                    found=true;
                    break; // break the inner for loop, no need to continue the search
                }
            }

            if (!found)
            {
                output[count++]=in;
            }
        }

        System.out.println(Arrays.toString(output));
    }
}

输出：

[a, b, c, d, e, f, null, null, null]

请注意我是如何简化 for 循环的。 另请注意，输出数组包含一些未使用的空格。 计数器变量包含数组的实际使用大小。

从 String[] 中删除重复项

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-02-15 14:33:46

从 String[] 中删除重复项

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-02-15 14:33:46

解决方案1
0 已采纳 2020-02-15 14:33:46