简体   繁体   中英

Remove duplicates from String[]

I've created a webscraper to pull links, every now and then I'm getting dupe entries, I'm storing them in an Array of String[], I've thrown a couple of loops together to check for dupes but failing to remove the duplicates.

    public static void checkForDupes(String[] links) {

    for (int i = 0; i<links.length; i++) {
        String check = links[i];

        for (String link : links) {

            if (check.equals(link)) {
                // remove link
            } 
        }       
    }

}

Simple arrays have a fixed number of references to objects with fixed position. Therefore you cannot remove objects, you can only replace them by others or null, which would leave wholes in the middle of the array. And that is possibly not what you want.

Since you cannot remove elements from an array, you would need to replace the whole array by a new one which contains only the wanted entries. But that arises a new problem: At start you have to specify the size of the target array but you do not know which size you will finally need. So you either need to oversize the array and store the real used length in another variable use a LinkedList which supports a variable size and good performance when adding elements.

In addition, such nested loops tend to become very slow when they are large. When you have more than 20 entries, collecting values in a HashSet is much quicker than such simple for-loops and also does eliminate duplicates already as a side effect.

A very similar class is the HashMap which would not eliminate duplicates. You should google for an explanation how hashing algorithms and hash maps work. That is a very interesting topic.

Example:

import java.util.Arrays;
import java.util.HashSet;

public class Main
{
    public static void main(String[] args) throws Exception
    {

        String[] links = {"a","b","a","c","b","c","d","e","f"};

        HashSet<String> set=new HashSet<>();
        set.addAll(Arrays.asList(links));

        System.out.println(set);
    }
}

Outputs:

[a, b, c, d, e, f]

To read an individual element out of the set, you may use set.get(index) .

The following example shows how to achieve the same without HashSet:

import java.util.Arrays;
import java.util.HashSet;

public class Main
{
    public static void main(String[] args) throws Exception
    {

        String[] input = {"a", "b", "a", "c", "b", "c", "d", "e", "f"};

        String[] output = new String[input.length];
        int count = 0;

        // Iterate over the input array
        for (String in : input)
        {
            // Check if the string is already in the output array
            boolean found=false;
            for (String out : output)
            {
                if (in.equals(out))
                {
                    found=true;
                    break; // break the inner for loop, no need to continue the search
                }
            }

            if (!found)
            {
                output[count++]=in;
            }
        }

        System.out.println(Arrays.toString(output));
    }
}

Outputs:

[a, b, c, d, e, f, null, null, null]

Note how I simplified the for loops. Also notice that the output array contains some unused spaces. The counter variable contains the real used size of the array.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM