简体   繁体   中英

check if list2.containsAll(list1) but with not exact same String

I have List1 and List2, if all Strings in List1 exist in a similar form (see below), I want to receive true.

The problem now is,

List1:
1. iscat
2. ishooman
3. isdoge

List2:
1. is_Cat
2. is_Hooman
3. is_doge

Logically the function list2.containsAll(list1) returns a false because the Strings are not the equal.

How would I check for similar Strings? I can think of Regex but wouldn't have a explicit term in my mind right now, also I do not know how to use Regex in Java yet.

Thanks

commons-collections4 has a CollectionUtils.isEqualCollection() method with Equator (similar as the equals() method)as input.

You can generate a equator to make these kind of strings the same, and then call isEqualCollection()

    Equator<String> equator = new Equator<String>() {
        @Override
        public boolean equate(String o1, String o2) {
            if  (StringUtils.equalsIgnoreCase(o1.replaceAll("_", ""), o2.replaceAll("_", ""))) {
                return true;
            } else {
                return false;
            }
        }

        @Override
        public int hash(String o) {
            return HashCodeBuilder.reflectionHashCode(o.replaceAll("_", "")).toLowerCase();
        }
    };
    List<String> a = new ArrayList<>();
    List<String> b = new ArrayList<>();
    a.add("iscat");
    b.add("is_Cat");
    System.out.println(CollectionUtils.isEqualCollection(a, b, equator));

And there is a similar CollectionUtils.removeAll() function. I just see that you just need containAll(), so you can use removeAll instead. If nothing left after A removeAll B, we can see B containsAll A.

Theres is one API on GitHub that can check String similarity, you use it!

It seens that Jaro-Winkler implements an algorithm of similarity and distance(?). Check this example:

import info.debatty.java.stringsimilarity.*;

public class MyApp {


    public static void main(String[] args) {
        JaroWinkler jw = new JaroWinkler();

        // substitution of s and t
        System.out.println(jw.similarity("My string", "My tsring"));

        // substitution of s and n
        System.out.println(jw.similarity("My string", "My ntrisg"));
    }
}

Output:

0.9740740656852722

0.8962963223457336

You could iterate in your list, call this lib, and save the results to compare later

java-string-similarity

Well, you can check if each string contains all the letters in the string from the other list (in a one-way containment):

for(int i = 0; i<list1.size();i++){
    if(Collections.contains(list1.get(i).toLowerCase().toCharacterArray(),list2.get(i).toLowerCase().toCharacterArray())
     || Collections.contains(list2.get(i).toLowerCase().toCharacterArray(),list1.get(i).toLowerCase().toCharacterArray())){
        //then they are similar

    }
}

This checks if the character array of one of the strings is contained within the other.

Try something like this:

List<String> l1 = Arrays.asList("iscat", "ishooman", "isdoge");
List<String> l2 = Arrays.asList("is_Cat", "is_Hooman", "is_doge");

System.out.println(l2.stream().map(s->s.toLowerCase().replace("_", "")).collect(Collectors.toList()).containsAll(l1));

The above code uses streams to map the strings to the required format using this logic: s->s.toLowerCase().replace("_", "") . You can add further logic to it if there are more changes.


Hope this helps!

Imagine you take one element of list2, turn it to lowercase and remove the _, then check if that is present in the list1, now if you repeat that with all elements in the list2 and filter that list then 2 things can happen:

  1. the resulting list has the same size than list1: meaning all List2 elements are in list1
  2. the resulting list has a different size than list1: meaning at least one element in List2 is not present in list1

List<String> myList = Arrays.asList("iscat", "ishooman", "isdoge");
List<String> myList2 = Arrays.asList("is_Cat", "is_Hooman", "is_Doge");
List<String> myListResult = new ArrayList<>(myList);
myListResult = myList2.stream().filter(x -> myList.contains(x.toLowerCase().replace("_", "")))
        .collect(Collectors.toList());

System.out.println(myListResult.size() == myList.size());

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM