简体   繁体   中英

How to use regular expression while searching in HashSet

I am writing a Java program in which I need to search a particular word from a Set. The word that has to be searched is something like ("wo.d") where '.' can be replaced by any other alphabet. I am using regex to match such type of word cases.

This is what I have so far

HashSet<String> words = new HashSet<String>();//this set is already populated
String word = "t.st";
if(word.contains(".")){
    Pattern p = Pattern.compile(word);
    Matcher m;
    boolean match = false;
    for(String setWord : words){
        m = p.matcher(setWord);
        if(m.matches())
            match = true;
    }
    if(match)
        System.out.println("Its a match");
    else
        System.out.println("Its not a match");
}
else{
    System.out.println("The word does not contain regex do other stuff");
}

The code above works but is not efficient because it is being called many times in a second. So it produces a lag in the program.

You need to stop iterating as soon as you get a match, so assuming that you use Java 8 , your for loop could be rewritten efficiently as next:

boolean match = words.stream().anyMatch(w -> p.matcher(w).matches());

You could also parallelize the research using parallelStream() instead of stream() especially if your Set has a lot of words.

If you don't use Java 7 , it could still be done using FluentIterable from Google Guava but without the ability to parallelize the research unfortunately.

boolean match = FluentIterable.from(words).anyMatch(
    new Predicate<String>() {
        @Override
        public boolean apply(@Nullable final String w) {
            return p.matcher(w).matches();
        }
    }
);

But in your case, I don't believe that using FluentIterable can be more interesting than simply adding a break when you get a match, as it will still be easier to read and maintain

if (p.matcher(setWord).matches()) {
    match = true;
    break;
}

So, if you really need to use a regular expression and you cannot use Java 8 , your best option is to use break as described above, there is no magic trick to consider.


Assuming that you will only have one character to replace , it could be done using startsWith(String) and endsWith(String) which will always be much faster than a regular expression . Something like this:

// Your words should be in a TreeSet to be already sorted alphabetically 
// in order to get a match as fast as possible
Set<String> words = new TreeSet<String>(); //this set is already populated
int index = word.indexOf('.');
if (index != -1) {
    String prefix = word.substring(0, index);
    String suffix = word.substring(index + 1);
    boolean match = false;
    for (String setWord : words){
        // From the fastest to the slowest thing to check 
        // to get the best possible performances
        if (setWord.length() == word.length() 
            && setWord.startsWith(prefix) 
            && setWord.endsWith(suffix)) {
            match = true;
            break;
        }
    }
    if(match)
        System.out.println("Its a match");
    else
        System.out.println("Its not a match");
}
else {
    System.out.println("The word does not contain regex do other stuff");
}

Use TreeSet instead of HashSet. And test for sub range of the set.

TreeSet<String> words = new TreeSet<>();// this set is already populated
String word = "t.st";
if (word.contains(".")) {
    String from = word.replaceFirst("\\..*", "");
    String to = from + '\uffff';
    Pattern p = Pattern.compile(word);
    Matcher m;
    boolean match = false;
    for (String setWord : words.subSet(from, to)) {
        m = p.matcher(setWord);
        if (m.matches()) {
            match = true;
            break;
        }
    }
    if (match)
        System.out.println("Its a match");
    else
        System.out.println("Its not a match");
} else {
    System.out.println("The word does not contain regex do other stuff");
}

In this case words.subSet(from, to) contains only the words start with "t".

Just break out of loop to stop further regex matching of your HashSet as soon as you get a match:

if(m.matches()) {
   match = true;
   break;
}

Full Code:

HashSet<String> words = new HashSet<String>();//this set is already populated
String word = "t.st";
if(word.contains(".")){
    Pattern p = Pattern.compile(word);
    Matcher m;
    boolean match = false;
    for(String setWord : words){
        m = p.matcher(setWord);
        if(m.matches()) {
            match = true;
            break:
        }
    }
    if(match)
        System.out.println("Its a match");
    else
        System.out.println("Its not a match");
}
else{
    System.out.println("The word does not contain regex do other stuff");
}

Use original matching method like this.

static boolean match(String wild, String s) {
    int len = wild.length();
    if (len != s.length())
        return false;
    for (int i = 0; i < len; ++i) {
        char w = wild.charAt(i);
        if (w == '.')
            continue;
        else if (w != s.charAt(i))
            return false;
    }
    return true;
}

and

HashSet<String> words = new HashSet<>();// this set is already populated
String word = "t.st";
boolean match = false;
if (word.contains(".")) {
    for (String setWord : words) {
        if (match(word, setWord)) {
            match = true;
            break;
        }
    }
    if (match)
        System.out.println("Its a match");
    else
        System.out.println("Its not a match");
} else {
    System.out.println("The word does not contain regex do other stuff");
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM