简体   繁体   English

从字符串数组列表中删除重复项

[英]Remove duplicates from a list of String Array

I know there's a lot of subject about "removing duplicates of a list". 我知道有很多关于“删除列表重复项”的主题。 I liked the solution with HashSet . 我喜欢HashSet的解决方案。 However, what I have is an list of String[], and it won't work with it. 但是,我只有String []的列表,它将无法使用它。 Probably because stringArray1.equals(stringArray2) will return false even if the two stringArray are the same; 可能是因为即使两个stringArray相同, stringArray1.equals(stringArray2)也会返回false; to compare string Array, we have to use Arrays.equals, which is not the case with HashSet . 要比较字符串Array,我们必须使用Arrays.equals,而HashSet则不是这种情况。

So i have an userList of String[] users with only 2 strings in it: username, and userID. 所以我有一个String[]用户的userList,其中只有2个字符串:username和userID。 Since both are linked (there's only one userID per username), it would be enough for me to compare only one of those strings. 由于两者都是链接的(每个用户名只有一个userID),所以仅比较那些字符串中的一个就足够了。

What I need is a fast way to remove duplicates from the list. 我需要的是一种从列表中删除重复项的快速方法。

I thought about something like this: 我想到了这样的事情:

List<String> userNamesList = new ArrayList<String>();
List<String[]> userListWithoutDuplicates = new ArrayList<String[]>();
for(String[] user : userList){
    if(!userNamesList.contains(user[0])){
        userNamesList.add(user[0]);
        userListWithoutDuplicates.add(user);
    }
}

However, this need two new List and a loop (I'm pretty sure any other solution would need this loop, still). 但是,这需要两个新的List和一个循环(我很确定其他解决方案仍然需要该循环)。

I'm wondering if there's not a better solution. 我想知道是否有更好的解决方案。 I thought something like that should already be implemented somewhere. 我认为类似的事情应该已经在某处实现。

EDIT: I got my array from an sql query. 编辑:我从SQL查询中得到我的数组。 In fact, i have a DB and some users. 实际上,我有一个数据库和一些用户。 One user will search for others users responding to certain conditions in DB, DB send back a list of String[] {username, userID} to this user. 一个用户将在数据库中搜索对某些条件作出响应的其他用户,数据库将向该用户发送String [] {用户名,用户ID}的列表。 So i already have an user class, which contains far more than only username and ID. 所以我已经有一个用户类,它不仅包含用户名和ID。 I have one instance of this class per connected user, but the DB can't access those instances, so she can't send it. 每个连接的用户都有一个此类的实例,但是数据库无法访问这些实例,因此她无法发送该实例。 I thought a String array was an easy solution. 我认为String数组是一个简单的解决方案。 I didn't thought that, in certain cases, an user can be referenced more than one time in DB and so selected more than one time. 我不认为在某些情况下,在数据库中可以多次引用一个用户,因此可以多次选择一个用户。 That's why i got duplicates in my list. 这就是为什么我的列表中有重复项。

The best approach would be to map every user returned from the DB to an object with the two mentioned strings username and userID . 最好的方法是将从数据库返回的每个用户映射到具有两个提到的字符串usernameuserID Then hashCode and equals should be implemented according to your defintion of equality/duplicate. 然后应根据您对相等性/重复项的定义来实现hashCodeequals Based on this there are many ways to get rid of duplicates. 基于此,有很多方法可以消除重复项。 You could add all found users to a Set or stream over a list of such users and call Stream.distinct() to reduce the users to unique ones: 您可以将所有找到的用户添加到Set或在此类用户的列表中流式传输,然后调用Stream.distinct()将用户减少为唯一的用户:

List<User> distinctUsers = users.stream().distinct().collect(Collectors.toList());

If you need to go on with the current structure, you cannot use Stream.distinct() as it would compare string arrays by their object identity. 如果需要继续使用当前结构,则不能使用Stream.distinct()因为它将通过字符串数组的对象标识比较字符串数组。 The equality has to be specified explcitly. 必须明确指定相等性。 We can do this eg in the following way: 我们可以通过以下方式做到这一点:

Function<String[], String> comparingBy = user -> user[1]; // user[1] = ID
List<String[]> distinctUsers = users.stream()
        .collect(Collectors.groupingBy(comparingBy))
        .values().stream()
        .map(u -> u.get(0))
        .collect(Collectors.toList());

This will group all users by the Function comapringBy . 这将按Function comapringBy对所有用户进行comapringBy comapringBy should reflect your definition of equality, thus one from two equal users is a duplicate. comapringBy应该反映您对平等的定义,因此来自两个相等用户的一个是重复的。 According to Stream.distinct " the element appearing first in the encounter order is preserved ". 根据Stream.distinct保留在遇到顺序中首先出现的元素 ”。 The result is a distinct list, a list without duplicates. 结果是一个不同的列表,没有重复的列表。

Another data type would be the mentioned Set . 另一个数据类型是提到的Set When creating a TreeSet it's also possible to provide the definition of equality explicitly. 创建TreeSet ,还可以显式提供相等性的定义。 We can use the same comapringBy as above: 我们可以使用与上面相同的comapringBy

Set<String[]> distinctUsers = new TreeSet<>(Comparator.comparing(comparingBy));
distinctUsers.addAll(users);

If you are using Java 8 you can use stream 如果您使用的是Java 8,则可以使用流

String[] arrWithDuplicates = new String[]{"John", "John", "Mary", "Paul"};
String[] arrWithoutDuplicates = Arrays.stream(arrWithDuplicates).distinct().toArray(String[]::new);

In arrWithoutDuplicates you'll have "John", "Mary" and "Paul" arrWithoutDuplicates您将拥有“约翰”,“玛丽”和“保罗”

Edited: converted userNamesList to HashSet, thanks @Aris_Kortex. 编辑:将userNamesList转换为HashSet,谢谢@Aris_Kortex。 This can reduce complecity from O(n^2) to O(n), because complecity of searching in HashSet is O(1). 这可以将复杂度从O(n ^ 2)减少到O(n),因为在HashSet中搜索的复杂度是O(1)。

    Set<String> userSet = new HashSet<>(userNamesList);
    List<String[]> userListWithoutDuplicates = userList.stream()
        .filter(user -> !userSet.contains(user[0]))
        .collect(Collectors.toList());

distinct() on stream does not help as it remove all duplicates from stream: in this case it removes duplicates of arrays where 0th and 1st elements are equal to corresponding elements from other array. stream()上的distinct()无济于事,因为它会从流中删除所有重复项:在这种情况下,它将删除第0个元素和第一个元素与其他数组中的对应元素相同的数组的重复项。

But as I understand, TC would like to remove only those users who has names(0th element) containing in some predefined list. 但是据我了解,TC仅希望删除名称(第0个元素)包含在某些预定义列表中的那些用户。

I certainly think that you should use a Set rather than a list in first place. 我当然认为您应该首先使用Set而不是列表。 We can modify this according to your time and space complexity,Here is a simple 2 line answer to your code. 我们可以根据您的时间和空间复杂性进行修改,这是您的代码的简单两行答案。

        Set set = new HashSet(userNamesList);
        List<String> list = new ArrayList(set);

A working example is run here : https://ideone.com/JznZCE It really depends on what you need to achieve,and if your users are unique, you should simply get a set rather than a list, Also if instead of "String",the info is contained in user object, the order of users need not be changed by this and can be implemented to put users by id or name later. 一个有效的示例在这里运行: https : //ideone.com/JznZCE这实际上取决于您需要实现什么,并且如果您的用户是唯一的,您应该只获取一个集合而不是一个列表,而且如果不是“ String” ”,该信息包含在用户对象中,因此用户的顺序无需更改,并且可以实现以后通过ID或名称来放置用户。

You can then change how equals is compared by overriding Equals and hashcode method of User Class to use custom implementation to compare. 然后,您可以通过重写用户类的Equals和hashcode方法来使用自定义实现进行比较,从而更改比较equals的方式。

Hope this helps! 希望这可以帮助!

Edit: If source of info is coming from DB,See how you can get a unique list by use of "DISTINCT" keyword (similar mysql construct) , to handle this logic away from your code. 编辑:如果信息源来自数据库,请参阅如何使用“ DISTINCT”关键字(类似mysql构造)来获取唯一列表,以处理代码之外的逻辑。

You can use the toMap collector to provide a custom keyMapper function which serves as a uniqueness test, then simply use the values of the map as your result. 您可以使用toMap收集器提供一个自定义的keyMapper函数,该函数用作唯一性测试,然后只需将地图的values用作结果即可。

For your uniqueness test, I think it makes more sense to use index 1 (the userID) instead of index 0 (the userName). 对于您的唯一性测试,我认为使用索引1(用户ID)而不是索引0(用户名)更有意义。 However, if you wish to change it back, use arr[0] instead of arr[1] below: 但是,如果您希望将其改回,请使用arr[0]代替下面的arr[1]

List<String[]> userList = new ArrayList<>();
userList.add(new String[]{"George","123"});
userList.add(new String[]{"George","123"});
userList.add(new String[]{"George","456"});
List<String[]> userListNoDupes = new ArrayList<>(userList.stream()
    .collect(Collectors.toMap(arr-> arr[1], Function.identity(), (a,b)-> a)).values());
for(String[] user: userListNoDupes) {
    System.out.println(Arrays.toString(user));
}

Output: 输出:

[George, 123] [乔治123]

[George, 456] [乔治,456]

Check this topic: Removing duplicate elements from a List 检查此主题: 从列表中删除重复的元素

You can convert the list in a set (which doesn't allow duplicates) and then back in a List if you really need this type of collection. 您可以将列表转换为一组(不允许重复),然后如果确实需要这种类型的集合,则可以返回列表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM