简体   繁体   English

通过仅在Java中查看字符串的特定部分来删除字符串数组的重复项

[英]Remove duplicates of a String Array by looking at a specific part of a String only in Java

What I have is an array of String objects called Id , I'm then looping through the Id objects to get the email address associated with that Id (then concatenate them). 我拥有的是称为IdString对象数组,然后遍历Id对象以获取与该Id相关联的电子邮件地址(然后将它们连接起来)。 I now want to remove the String from the array that only has a duplicate email address (ignore by looking at the Id in the String ). 现在,我想从只有重复电子邮件地址的数组中删除String (通过查看String中的Id来忽略它)。

For example the String array contains the following Id objects: 例如, String数组包含以下Id对象:

{1, 2, 3}

Now I'm concatenating the email address associated with that Id , the String array becomes: 现在,我将与该Id关联的电子邮件地址串联起来, String数组变为:

{1 - test@gmail.com, 2 - any@gmail.com, 3 - test@gmail.com}

So I need to remove duplicate emails (along with the Id concatenated to it) to give me this: 因此,我需要删除重复的电子邮件(以及与之Id )以达到以下目的:

{1 - test@gmail.com, 2 - any@gmail.com}

Then after that I will remove the email address by using split to give me the final result of: 然后,我将使用split删除电子邮件地址,以得到以下最终结果:

{1, 2}

So the issue I'm having is, it off course look for the whole string including email and Id , but I need only to look for email address then remove the whole `String from the array. 所以我遇到的问题是,它当然会查找包括email和Id的整个字符串,但是我只需要查找电子邮件地址,然后从数组中删除整个`String。

I've done this bit of code: 我已经完成了以下代码:

//Remove all duplicate email addresses from list
ArrayList<String> duplicateEmails = new ArrayList<String>();

//looping through cform.getConsumers which is a String[] array of Id's, finding email address of that Id and concatenate it and add to the array list
for (String conId : cform.getToConsumers()){
    Long consId = Long.parseLong(conId);
    Consumer cons = af.getSingleConsumerId(consId);
    duplicateEmails.add(conId + " - " + cons.getEmail());
}

//convert arraylist to String array
String[] stringArray = duplicateEmails.toArray(new String[0]);
//remove the duplicates
Set<String> findDuplicates = new HashSet<String>(Arrays.asList(stringArray));
String[] removedEmails = findDuplicates.toArray(new String[0]);

Instead of writing Strings to the array, make an array of your own objects. 不用将Strings写入数组,而是创建一个自己的对象数组。 Example: 例:

class MyPair{
    String id;
    String email;
    public String toString(){
        return id + " - " + email;
    }
}

Then you can compare emails without parsing the whole String again. 然后,您可以比较电子邮件而无需再次解析整个String。

Full example with overriding hashCode() and equals(Object): 覆盖hashCode()和equals(Object)的完整示例:

package main;

import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;

public class Test {

    public static void main(String[] args) {
        new Test().deduplicate();
    }

    public class MyPair {
         public final String id;
            public final String email;

            public MyPair(String id, String email) {
                super();
                this.id = id;
                this.email = email;
            }

            public String toString(){
                return id + " - " + email;
            }

            @Override
            public boolean equals(Object obj){
                if (!(obj instanceof MyPair))
                    return false;
                if (obj == this)
                    return true;
                return this.email.equals(((MyPair)obj).email);
            }
            @Override
            public int hashCode(){
                return this.email.hashCode();
            }
    }

    public void deduplicate() {
        List<MyPair> pairs = new ArrayList<MyPair>();

        // test data
        MyPair p1 = new MyPair("1", "asd@asd.com");
        MyPair p2 = new MyPair("2", "qwe@qwe.com");
        MyPair p3 = new MyPair("3", "asd@asd.com");
        pairs.add(p1);
        pairs.add(p2);
        pairs.add(p3);

        // just to demonstrate the overridden methods
        System.out.println(p1.equals(p2));
        System.out.println(p1.equals(p3));
        System.out.println(p2.equals(p3));
        System.out.println(p1.hashCode());
        System.out.println(p2.hashCode());
        System.out.println(p3.hashCode());

        // dedup will be done by HashSet
        // This only works because we have overridden
        // hashCode and equals!
        HashSet<MyPair> deduped = new HashSet<MyPair>();
        for (MyPair pair : pairs) {
            deduped.add(pair);
        }

        System.out.println(deduped);

    }
}

Output: [2 - qwe@qwe.com, 1 - asd@asd.com] (note the changed order! This happens because of the hashing) 输出: [2 - qwe@qwe.com, 1 - asd@asd.com] (请注意更改的顺序!这是由于散列而发生的)

You can do like following : 您可以按照以下步骤操作:

import java.util.ArrayList;
import java.util.HashMap;
import java.util.LinkedHashMap;

public class Testing {

    public static void main(String[] args) {
        HashMap<String, String> hm = new LinkedHashMap<String, String>();

        String[] values = {"1","2","3"};
        ArrayList<String> ar = new ArrayList<String>();
        ar.add("test@gmail.com");
        ar.add("any@gmail.com");
        ar.add("test@gmail.com");


        for (int i = 0; i<ar.size();i++) {

            if (!hm.containsKey(ar.get(i)))
                hm.put(ar.get(i), values[i]);
        } 
        System.out.println(hm);

    }

}

Output : 输出:

[1, 2]

Why not using 3 arrays in a way that you keep the indexes for the duplicated eMails in the third array. 为什么不使用3个数组,以便将重复电子邮件的索引保留在第三个数组中。 And then go through the first (ID's) Array and remove those indexes from it. 然后遍历第一个(ID的)数组,并从中删除那些索引。

So you have ID array, eMail Array, Indexes to remove array. 因此,您具有ID数组,电子邮件数组,删除数组的索引。

Then release those arrays from memory :) 然后从内存释放这些数组:)

In that way you dont need to concat strings and then search strings 这样,您无需连接字符串然后搜索字符串

Just an idea. 只是一个主意。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM