简体   繁体   中英

regular expression matching in java without using regex

I recently got an interview question where I needed to implement regular expression matching in Java without using regex matching.

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.', '+', '*' and '?'.


// (pattern, match) --> bool does it match
// operators:
// . --> any char
// + --> 1 or more of prev char
// * --> 0 or more of prev char
// ? --> 0 or 1 of prev char

// (abc+c, abcccc) --> True
// (abcd*, abc) --> True
// (abc?c, abc) --> True
// (abc.*, abcsdfsf) --> True

I came up with below code which only implements '.' and '*' but not able to figure out how to implement others:

  public static boolean isMatch(String s, String p) {
    if (p.length() == 0) {
      return s.length() == 0;
    }
    if (p.length() > 1 && p.charAt(1) == '*') { // second char is '*'
      if (isMatch(s, p.substring(2))) {
        return true;
      }
      if (s.length() > 0 && (p.charAt(0) == '.' || s.charAt(0) == p.charAt(0))) {
        return isMatch(s.substring(1), p);
      }
      return false;
    } else { // second char is not '*'
      if (s.length() > 0 && (p.charAt(0) == '.' || s.charAt(0) == p.charAt(0))) {
        return isMatch(s.substring(1), p.substring(1));
      }
      return false;
    }
  }

Also what is the best way to implement this problem?

Here is untested code. The idea is that we keep track of where we are in the string and the pattern. This would NOT be a good approach to try to extend to a full RE engine (just consider what adding parentheses would take), but is fine for this case:

public static boolean isMatch (String p, String s, int pos_p, int pos_s) {
    if (pos_p == p.length()) {
        // We matched the whole pattern.
        return true;
    }
    else if (pos_s == s.length()) {
        // We reached the end of the string without matching.
        return false;
    }
    else if (pos_p == -1) {
        // Do we match the pattern starting next position?
        if (isMatch(p, s, pos_p + 1, pos_s + 1)) {
            return true;
        }
        else {
            // Try to match the pattern starting later.
            return isMatch(p, s, pos_p, pos_s + 1);
        }
    }
    else {
        char thisCharP = p.charAt(pos_p);
        char nextCharP = pos_p + 1 < p.length() ? p.charAt(pos_p + 1) : 'x';

        // Does this character match at this position?
        boolean thisMatch = (thisCharP == s.charAt(pos_s));
        if (thisCharP == '.') {
            thisMatch = true;
        }

        if (nextCharP == '*') {
            // Try matching no times - we don't need thisMatch to be true!
            if (isMatch(p, s, pos_p + 2, pos_s)) {
                return true;
            }
            else {
                // Try matching 1+ times, now thisMatch is required.
                return thisMatch && isMatch(p, s, pos_p, pos_s + 1);
            }
        }
        else if (nextCharP == '+') {
            if (! thisMatch) {
                // to match 1+, we have to match here.
                return false;
            }
            else if (isMatch(p, s, pos_p + 2, pos_s + 1)) {
                // We matched once.
                return true;
            }
            else {
                // Can we match 2+?
                return isMatch(p, s, pos_p, pos_s + 1);
            }
        }
        else if (thisMatch) {
            // Can we match the rest of the pattern?
            return isMatch(p, s, pos_p + 1, pos_s + 1);
        }
        else {
            // We didn't match here, this is a fail.
            return false;
        }
    }
}

public static boolean isMatch (String p, String s) {
    // Can we match starting anywhere?
    return isMatch(p, s, -1, -1);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM