繁体   English   中英

PHP Regex查找模式并包装锚标记

[英]PHP Regex to find pattern and wrap in anchor tags

我有电影标题和发行年份的字符串。 我希望能够检测到“标题(年份)”模式,如果匹配,则将其包装在锚标记中。

包装起来很容易。 但是如果我不知道电影的名字是什么,是否有可能写一个正则表达式来匹配这种模式?

例:

$str = 'A random string with movie titles in it. 
Movies like The Thing (1984) and other titles like Captain America Civil War (2016). 
The movies could be anywhere in this string. 
And some movies like 28 Days Later (2002) could start with a number.';

因此,模式将始终为Title (以大写字母开头)并以(Year)结尾。

到目前为止,这是我得到的:

if(preg_match('/^\p{Lu}[\w%+\/-]+\([0-9]+\)/', $str)){
    error_log('MATCH');
}
else{
    error_log('NO MATCH');
}

目前这不起作用。 据我了解,这是应该做什么:

^\\p{Lu} //match a word beginning with an uppercase letter

[\\w%+\\/-] //with any number of characters following it

+\\([0-9]+\\) //ending with an integer

我在哪里错呢?

以下正则表达式应该做到这一点:

(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)

说明

  • (?-i)区分大小写
  • (?<=[az]\\s)向后查找任何小写字母和空格
  • [AZ\\d]匹配大写字母或数字
  • .*? 匹配任何字符
  • \\(\\d+\\)匹配包括括号在内的任何数字

DEMO

PHP

<?php
$regex = '/(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)/';
$str = 'A random string with movie titles in it.
       Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
       The movies could be anywhere in this string.
       And some movies like 28 Days Later (2002) could start with a number.';
preg_match_all($regex, $str, $matches);
print_r($matches);
?>

这个正则表达式可以完成这项工作:

~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~

说明:

~               : regex delimiter
  (?:           : start non capture group
    [A-Z]       : 1 capital letter, (use \p{Lu} if you want to match title in any language)
    [a-zA-Z]+   : 1 or more letter,  if you want to match title in any language(use \p{L})
    \s+         : 1 or more spaces
   |            : OR
    \d+         : 1 or more digits
    \s+         : 1 or more spaces
  )+            : end group, repeated 1 or more times
  \(\d+\)       : 1 or more digits surrounded by parenthesis, (use \d{4} if the year is always 4 digits)
~               : regex delimiter

执行:

$str = 'A random string with movie titles in it. 
Movies like The Thing (1984) and other titles like Captain America Civil War (2016). 
The movies could be anywhere in this string. 
And some movies like 28 Days Later (2002) could start with a number.';

if (preg_match_all('~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~', $str, $match)) {
    print_r($match);
    error_log('MATCH');
}
else{
    error_log('NO MATCH');
}

结果:

Array
(
    [0] => Array
        (
            [0] => The Thing (1984)
            [1] => Captain America Civil War (2016)
            [2] => 28 Days Later (2002)
        )

)
MATCH

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM