简体   繁体   中英

Normalizing Regular Expression String

I have been reading a book on frameworks called PRO PHP MVC. In the book a StringMethods class was created. The code looks like this.

class StringMethods
{

private static $_delimiter = "#";

    private function __construct()
    {
        // do nothing
    }

    private function __clone()
    {
        // do nothing
    }

    private static function _normalize($pattern)
    {
        return self::$_delimiter.trim($pattern, self::$_delimiter).self::$_delimiter;
    }

    public static function getDelimiter()
    {
        return self::$_delimiter;
    }

    public static function setDelimiter($delimiter)
    {
        self::$_delimiter = $delimiter;
    }

    public static function match($string, $pattern)
    {
        preg_match_all(self::_normalize($pattern), $string, $matches, PREG_PATTERN_ORDER);
        if(!empty($matches[1]))
        {
            return $matches[1];
        }
        if(!empty($matches[0]))
        {
            return $matches[0];
        }

        return null;
    }

    public static function split($string, $pattern, $limit = null)
    {
        $flags = PREG_SPLIT_NO_EMPTY | PREG_SPLIT_NO_EMPTY;
        return preg_split(self::_normalize($pattern), $string, $limit, $flags);
    }

}

My question is what is the $_delimiter for? what purpose does it server in the $_normalization function. Is it something to do with regular expressions which I'm not very familiar with other then the fact that it is a custom pattern used to match parts of a string.

The book explanation was as follows:

The $delimiter and _normalize() members are all for the normalization of regular expression strings, so that the remaining methods can operate on them without first having to check or normalize them. The match() and split() methods perform similarly to the preg_match_all() and preg_split() functions, but require less formal structure to the regular expressions, and return a more predictable set of results. The match() method will return the first captured substring, the entire substring match, or null. The split() method will return the results of a call to the preg_split() function, after setting some flags and normalizing the regular expression.

Thanks for the help in advance.

It appears that the normalize function is adding the pattern delimiter manually to the regular expression pattern. The code is using the hash "#" character. The normalize function strips any hash characters that may already be start and end, then adds the delimiters back. The normal pattern delimiter is the forward slash "/" (ie /your[reg]ex[here]/ )

The end result is that it does not matter if you type

your[reg]ex[here]

or

#your[reg]ex[here]#

Both patterns will work fine.

The PCRE functions in PHP all require that the regular expression begin and end with matching delimiter characters, so that optional modifiers can be put after the second delimiter, eg

preg_match('/foo/i', $string);

The delimiters in that case are the / characters, and i is the modifier.

Your class allows you to wrap the regexp in delimiters, but doesn't require it. $delimiter is the character it expects you to use as the delimiters. The _normalize method will add the delimiters if they're not already there, before calling preg_match() . It does this by calling trim($pattern, $self::$delimiter) to remove the delimiters if they're already there, and then concatenating the delimiters at each end.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM