I am writing a Java program in which I need to search a particular word from a Set. The word that has to be searched is something like ("wo.d") where '.' can be replaced by any other alphabet. I am using regex to match such type of word cases.
This is what I have so far
HashSet<String> words = new HashSet<String>();//this set is already populated
String word = "t.st";
if(word.contains(".")){
Pattern p = Pattern.compile(word);
Matcher m;
boolean match = false;
for(String setWord : words){
m = p.matcher(setWord);
if(m.matches())
match = true;
}
if(match)
System.out.println("Its a match");
else
System.out.println("Its not a match");
}
else{
System.out.println("The word does not contain regex do other stuff");
}
The code above works but is not efficient because it is being called many times in a second. So it produces a lag in the program.
You need to stop iterating as soon as you get a match, so assuming that you use Java 8
, your for
loop could be rewritten efficiently as next:
boolean match = words.stream().anyMatch(w -> p.matcher(w).matches());
You could also parallelize the research using parallelStream()
instead of stream()
especially if your Set
has a lot of words.
If you don't use Java 7
, it could still be done using FluentIterable
from Google Guava but without the ability to parallelize the research unfortunately.
boolean match = FluentIterable.from(words).anyMatch(
new Predicate<String>() {
@Override
public boolean apply(@Nullable final String w) {
return p.matcher(w).matches();
}
}
);
But in your case, I don't believe that using FluentIterable
can be more interesting than simply adding a break
when you get a match, as it will still be easier to read and maintain
if (p.matcher(setWord).matches()) {
match = true;
break;
}
So, if you really need to use a regular expression and you cannot use Java 8
, your best option is to use break
as described above, there is no magic trick to consider.
Assuming that you will only have one character to replace , it could be done using startsWith(String)
and endsWith(String)
which will always be much faster than a regular expression . Something like this:
// Your words should be in a TreeSet to be already sorted alphabetically
// in order to get a match as fast as possible
Set<String> words = new TreeSet<String>(); //this set is already populated
int index = word.indexOf('.');
if (index != -1) {
String prefix = word.substring(0, index);
String suffix = word.substring(index + 1);
boolean match = false;
for (String setWord : words){
// From the fastest to the slowest thing to check
// to get the best possible performances
if (setWord.length() == word.length()
&& setWord.startsWith(prefix)
&& setWord.endsWith(suffix)) {
match = true;
break;
}
}
if(match)
System.out.println("Its a match");
else
System.out.println("Its not a match");
}
else {
System.out.println("The word does not contain regex do other stuff");
}
Use TreeSet instead of HashSet. And test for sub range of the set.
TreeSet<String> words = new TreeSet<>();// this set is already populated
String word = "t.st";
if (word.contains(".")) {
String from = word.replaceFirst("\\..*", "");
String to = from + '\uffff';
Pattern p = Pattern.compile(word);
Matcher m;
boolean match = false;
for (String setWord : words.subSet(from, to)) {
m = p.matcher(setWord);
if (m.matches()) {
match = true;
break;
}
}
if (match)
System.out.println("Its a match");
else
System.out.println("Its not a match");
} else {
System.out.println("The word does not contain regex do other stuff");
}
In this case words.subSet(from, to)
contains only the words start with "t".
Just break out of loop to stop further regex matching of your HashSet
as soon as you get a match:
if(m.matches()) {
match = true;
break;
}
Full Code:
HashSet<String> words = new HashSet<String>();//this set is already populated
String word = "t.st";
if(word.contains(".")){
Pattern p = Pattern.compile(word);
Matcher m;
boolean match = false;
for(String setWord : words){
m = p.matcher(setWord);
if(m.matches()) {
match = true;
break:
}
}
if(match)
System.out.println("Its a match");
else
System.out.println("Its not a match");
}
else{
System.out.println("The word does not contain regex do other stuff");
}
Use original matching method like this.
static boolean match(String wild, String s) {
int len = wild.length();
if (len != s.length())
return false;
for (int i = 0; i < len; ++i) {
char w = wild.charAt(i);
if (w == '.')
continue;
else if (w != s.charAt(i))
return false;
}
return true;
}
and
HashSet<String> words = new HashSet<>();// this set is already populated
String word = "t.st";
boolean match = false;
if (word.contains(".")) {
for (String setWord : words) {
if (match(word, setWord)) {
match = true;
break;
}
}
if (match)
System.out.println("Its a match");
else
System.out.println("Its not a match");
} else {
System.out.println("The word does not contain regex do other stuff");
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.