简体   繁体   中英

Partial Matching of Regular Expressions

In NFA it is easy to make all previously non-final states accepting to make it match language of all substrings of a given language.

In Java regex engine, is there a way to find out if a string is a starting substring of a string that matches given regex?

regexX = "any start of", regexA - any given regex

"regexXregexA" resulting expression matches all substrings of matches "regexA":

example:

regexA = a*b

"a" matches

"regexXa*b"

because it is a start of "ab" (and "aab")
edit:

Since some people still fail to understand, here is a program test for this question:

import java.util.regex.*;
public class Test1 {
    public static void main(String args[]){
       String regex = "a*b";
       System.out.println(
       partialMatch(regex, "aaa");
       );
     }
public boolean partialMatch(String regex, String begining){
//return true if there is a string which matches the regex and    
//startsWith(but not equal) begining, false otherwise 
}
}

Results in true.

What you're looking for is called partial matching , and it's natively supported by the Java regex API (for the record, other engines which offer this feature include PCRE and boost::regex).

You can tell if an input string matched partially by inspecting the result of the Matcher.hitEnd function, which tells if the match failed because the end of the input string was reached.

Pattern pattern = Pattern.compile("a*b");
Matcher matcher = pattern.matcher("aaa");
System.out.println("Matches: " + matcher.matches());
System.out.println("Partial match: " + matcher.hitEnd());

This outputs:

Matches: false
Partial match: true

In NFA it is easy to make all previously non-final states accepting to make it match language of all substrings of a given language.

Indeed, it can be accomplished by adding a new final state and an ε-move from each state (final or non-final) to the new final state.

Afaik there is no regex equivalent for this operation.

It is possible that some regex libraries provides a way to verify if a string is a partial match of a regex, I don't know. I don't know Java, I work mainly in PHP and it doesn't provide such a feature. Maybe there are libraries that does it but I never needed one.

For a small, specific regex you can try to build a new regex that matches strings that would partially match the original regex by combining this simple rules:

  • a -> a?
  • ab -> ab?
  • a* -> a*
  • a+ -> a*
  • a|b -> (a|b)?
  • etc

a and b above are sub-regexps of the original regex. Use parentheses as needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM