简体   繁体   中英

PHP Regex to find pattern and wrap in anchor tags

I have a string with movie titles and release year. I want to be able to detect the Title (Year) pattern and if matched wrap it in anchor tags.

Wrapping it is easy. But is it possilbe to write a regex to match this pattern if I don't know what the name of the movie would be?

Example:

$str = 'A random string with movie titles in it. 
Movies like The Thing (1984) and other titles like Captain America Civil War (2016). 
The movies could be anywhere in this string. 
And some movies like 28 Days Later (2002) could start with a number.';

So the pattern will always be Title (starting with uppercase letter) and will end with (Year) .

This is what I have got so far:

if(preg_match('/^\p{Lu}[\w%+\/-]+\([0-9]+\)/', $str)){
    error_log('MATCH');
}
else{
    error_log('NO MATCH');
}

This currently does not work. From what I understand this is what it should do:

^\\p{Lu} //match a word beginning with an uppercase letter

[\\w%+\\/-] //with any number of characters following it

+\\([0-9]+\\) //ending with an integer

Where am I going wrong with this?

The following regex should do it :

(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)

Explanation

  • (?-i) case-sensitive
  • (?<=[az]\\s) look-behind for any lower-case letter and space
  • [AZ\\d] match an upper-case letter or digit
  • .*? match any character
  • \\(\\d+\\) match any digits including parenthesis

DEMO

PHP

<?php
$regex = '/(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)/';
$str = 'A random string with movie titles in it.
       Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
       The movies could be anywhere in this string.
       And some movies like 28 Days Later (2002) could start with a number.';
preg_match_all($regex, $str, $matches);
print_r($matches);
?>

This regex does the job:

~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~

Explanation:

~               : regex delimiter
  (?:           : start non capture group
    [A-Z]       : 1 capital letter, (use \p{Lu} if you want to match title in any language)
    [a-zA-Z]+   : 1 or more letter,  if you want to match title in any language(use \p{L})
    \s+         : 1 or more spaces
   |            : OR
    \d+         : 1 or more digits
    \s+         : 1 or more spaces
  )+            : end group, repeated 1 or more times
  \(\d+\)       : 1 or more digits surrounded by parenthesis, (use \d{4} if the year is always 4 digits)
~               : regex delimiter

Implementation:

$str = 'A random string with movie titles in it. 
Movies like The Thing (1984) and other titles like Captain America Civil War (2016). 
The movies could be anywhere in this string. 
And some movies like 28 Days Later (2002) could start with a number.';

if (preg_match_all('~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~', $str, $match)) {
    print_r($match);
    error_log('MATCH');
}
else{
    error_log('NO MATCH');
}

Result:

Array
(
    [0] => Array
        (
            [0] => The Thing (1984)
            [1] => Captain America Civil War (2016)
            [2] => 28 Days Later (2002)
        )

)
MATCH

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM