简体   繁体   中英

Can you explain/simplify this regular expression (PCRE) in PHP?

preg_match('/.*MyString[ (\/]*([a-z0-9\.\-]*)/i', $contents, $matches);

I need to debug this one. I have a good idea of what it's doing but since I was never an expert at regular expressions I need your help.

Can you tell me what it does block by block (so I can learn)?

Does the syntax can be simplified (I think there is no need to escape the dot with a slash)?

The regexp...

'/.*MyString[ (\/]*([a-z0-9\.\-]*)/i'

.* matches any character zero or more times

MyString matches that string. But you are using case insensitive matching so the matched string will spell "mystring" by but with any capitalization

EDIT: (Thanks to Alan Moore) [ (\\/]* . This matches any of the chars space ( or / repeated zero of more times. As Alan points out the final escape of / is to stop the / being treated as a regexp delimeter.

EDIT: The ( does not need escaping and neither does the . (thanks AlexV) because:

All non-alphanumeric characters other than \\, -, ^ (at the start) and the terminating ] are non-special in character classes, but it does no harm if they are escaped. -- http://www.php.net/manual/en/regexp.reference.character-classes.php

The hyphen, generally does need to be escaped, otherwise it will try to define a range. For example:

[A-Z]  // matches all upper case letters of the aphabet
[A\-Z] // matches 'A', '-', and 'Z'

However, where the hyphen is at the end of the list you can get away with not escaping it (but always best to be in the habit of escaping it... I got caught out by this].

([a-z0-9\\.\\-]*) matches any string containing the characters a through z (note again this is effected by the case insensitive match), 0 through 9, a dot, a hyphen, repeated zero of more times. The surrounding () capture this string. This means that $matches[1] will contain the string matches by [a-z0-9\\.\\-]* . The brackets () tell preg_match to "capture" this string.

eg

<?php
  $input = "aslghklfjMyString(james321-james.org)blahblahblah";
  preg_match('/.*MyString[ (\/]*([a-z0-9.\-]*)/i', $input, $matches);
  print_r($matches);
?>

outputs

Array
(
    [0] => aslghklfjMyString(james321-james.org
    [1] => james321-james.org
)

Note that because you use a case insensitive match...

$input = "aslghklfjmYsTrInG(james321898-james.org)blahblahblah";

Will also match and give the same answer in $matches[1]

Hope this helps....

Let's break this down step-by step, removing the explained parts from the expression.

"/.*MyString[ (\/]*([a-z0-9\.\-]*)/i"

Let's first strip the regex delimiters (/i at the end means it's case-insensitive):

".*MyString[ (\/]*([a-z0-9\.\-]*)"

Then we've got a wildcard lookahead (search for any symbol any number of times until we match the next statement.

"MyString[ (\/]*([a-z0-9\.\-]*)"

Then match 'MyString' literally, followed by any number (note the '*') of any of the following: ' ', '(', '/'. This is probably the error zone, you need to escape that '('. Try [ (/].

"([a-z0-9\.\-]*)"

Then we get a capture group for any number of any of the following: az literals, 0-9 digits, '.', or '-'.

That's pretty much all of it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM