简体   繁体   中英

Remove duplicates from a list of String Array

I know there's a lot of subject about "removing duplicates of a list". I liked the solution with HashSet . However, what I have is an list of String[], and it won't work with it. Probably because stringArray1.equals(stringArray2) will return false even if the two stringArray are the same; to compare string Array, we have to use Arrays.equals, which is not the case with HashSet .

So i have an userList of String[] users with only 2 strings in it: username, and userID. Since both are linked (there's only one userID per username), it would be enough for me to compare only one of those strings.

What I need is a fast way to remove duplicates from the list.

I thought about something like this:

List<String> userNamesList = new ArrayList<String>();
List<String[]> userListWithoutDuplicates = new ArrayList<String[]>();
for(String[] user : userList){
    if(!userNamesList.contains(user[0])){
        userNamesList.add(user[0]);
        userListWithoutDuplicates.add(user);
    }
}

However, this need two new List and a loop (I'm pretty sure any other solution would need this loop, still).

I'm wondering if there's not a better solution. I thought something like that should already be implemented somewhere.

EDIT: I got my array from an sql query. In fact, i have a DB and some users. One user will search for others users responding to certain conditions in DB, DB send back a list of String[] {username, userID} to this user. So i already have an user class, which contains far more than only username and ID. I have one instance of this class per connected user, but the DB can't access those instances, so she can't send it. I thought a String array was an easy solution. I didn't thought that, in certain cases, an user can be referenced more than one time in DB and so selected more than one time. That's why i got duplicates in my list.

The best approach would be to map every user returned from the DB to an object with the two mentioned strings username and userID . Then hashCode and equals should be implemented according to your defintion of equality/duplicate. Based on this there are many ways to get rid of duplicates. You could add all found users to a Set or stream over a list of such users and call Stream.distinct() to reduce the users to unique ones:

List<User> distinctUsers = users.stream().distinct().collect(Collectors.toList());

If you need to go on with the current structure, you cannot use Stream.distinct() as it would compare string arrays by their object identity. The equality has to be specified explcitly. We can do this eg in the following way:

Function<String[], String> comparingBy = user -> user[1]; // user[1] = ID
List<String[]> distinctUsers = users.stream()
        .collect(Collectors.groupingBy(comparingBy))
        .values().stream()
        .map(u -> u.get(0))
        .collect(Collectors.toList());

This will group all users by the Function comapringBy . comapringBy should reflect your definition of equality, thus one from two equal users is a duplicate. According to Stream.distinct " the element appearing first in the encounter order is preserved ". The result is a distinct list, a list without duplicates.

Another data type would be the mentioned Set . When creating a TreeSet it's also possible to provide the definition of equality explicitly. We can use the same comapringBy as above:

Set<String[]> distinctUsers = new TreeSet<>(Comparator.comparing(comparingBy));
distinctUsers.addAll(users);

If you are using Java 8 you can use stream

String[] arrWithDuplicates = new String[]{"John", "John", "Mary", "Paul"};
String[] arrWithoutDuplicates = Arrays.stream(arrWithDuplicates).distinct().toArray(String[]::new);

In arrWithoutDuplicates you'll have "John", "Mary" and "Paul"

Edited: converted userNamesList to HashSet, thanks @Aris_Kortex. This can reduce complecity from O(n^2) to O(n), because complecity of searching in HashSet is O(1).

    Set<String> userSet = new HashSet<>(userNamesList);
    List<String[]> userListWithoutDuplicates = userList.stream()
        .filter(user -> !userSet.contains(user[0]))
        .collect(Collectors.toList());

distinct() on stream does not help as it remove all duplicates from stream: in this case it removes duplicates of arrays where 0th and 1st elements are equal to corresponding elements from other array.

But as I understand, TC would like to remove only those users who has names(0th element) containing in some predefined list.

I certainly think that you should use a Set rather than a list in first place. We can modify this according to your time and space complexity,Here is a simple 2 line answer to your code.

        Set set = new HashSet(userNamesList);
        List<String> list = new ArrayList(set);

A working example is run here : https://ideone.com/JznZCE It really depends on what you need to achieve,and if your users are unique, you should simply get a set rather than a list, Also if instead of "String",the info is contained in user object, the order of users need not be changed by this and can be implemented to put users by id or name later.

You can then change how equals is compared by overriding Equals and hashcode method of User Class to use custom implementation to compare.

Hope this helps!

Edit: If source of info is coming from DB,See how you can get a unique list by use of "DISTINCT" keyword (similar mysql construct) , to handle this logic away from your code.

You can use the toMap collector to provide a custom keyMapper function which serves as a uniqueness test, then simply use the values of the map as your result.

For your uniqueness test, I think it makes more sense to use index 1 (the userID) instead of index 0 (the userName). However, if you wish to change it back, use arr[0] instead of arr[1] below:

List<String[]> userList = new ArrayList<>();
userList.add(new String[]{"George","123"});
userList.add(new String[]{"George","123"});
userList.add(new String[]{"George","456"});
List<String[]> userListNoDupes = new ArrayList<>(userList.stream()
    .collect(Collectors.toMap(arr-> arr[1], Function.identity(), (a,b)-> a)).values());
for(String[] user: userListNoDupes) {
    System.out.println(Arrays.toString(user));
}

Output:

[George, 123]

[George, 456]

Check this topic: Removing duplicate elements from a List

You can convert the list in a set (which doesn't allow duplicates) and then back in a List if you really need this type of collection.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM