简体   繁体   中英

Match multiple characters without repetion on a regular expression

I'm using PHP's PCRE, and there is one bit of the regex I can't seem to do. I have a character class with 5 characters [adjxz] which can appear or not, in any order, after a token (|) on the string. They all can appear, but they can only each appear once. So for example:

 *|ad     - is valid
 *|dxa    - is valid
 *|da     - is valid
 *|a      - is valid
 *|aaj    - is *not* valid
 *|adjxz  - is valid
 *|addjxz - is *not* valid

Any idea how I can do it? a simple [adjxz]+ , or even [adjxz]{1,5} do not work as they allow repetition. Since the order does not matter also, I can't do /a?d?j?x?z?/ , so I'm at a loss.

Perhaps using a lookahead combined with a backreference like this:

\|(?![adjxz]*([adjxz])[adjxz]*\1)[adjxz]{1,5}

demonstration

If you know these characters are followed by something else, eg whitespace you can simplify this to:

\|(?!\S*(\S)\S*\1)[adjxz]{1,5}

I think you should break this in 2 steps:

  1. A regex to check for unexpected characters
  2. A simple PHP check for duplicated characters
function strIsValid($str) {
    if (!preg_match('/^\*|([adjxz]+)$/', $str, $matches)) {
        return false;
    }

    return strlen($matches[1]) === count(array_unique(str_split($matches[1])));
}

I suggest using reverse logic where you match the unwanted case using this pattern
\\|.*?([adjxz])(?=.*\\1)
Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM