简体   繁体   English

PHP Regex查找模式并包装锚标记

[英]PHP Regex to find pattern and wrap in anchor tags

I have a string with movie titles and release year. 我有电影标题和发行年份的字符串。 I want to be able to detect the Title (Year) pattern and if matched wrap it in anchor tags. 我希望能够检测到“标题(年份)”模式,如果匹配,则将其包装在锚标记中。

Wrapping it is easy. 包装起来很容易。 But is it possilbe to write a regex to match this pattern if I don't know what the name of the movie would be? 但是如果我不知道电影的名字是什么,是否有可能写一个正则表达式来匹配这种模式?

Example: 例:

$str = 'A random string with movie titles in it. 
Movies like The Thing (1984) and other titles like Captain America Civil War (2016). 
The movies could be anywhere in this string. 
And some movies like 28 Days Later (2002) could start with a number.';

So the pattern will always be Title (starting with uppercase letter) and will end with (Year) . 因此,模式将始终为Title (以大写字母开头)并以(Year)结尾。

This is what I have got so far: 到目前为止,这是我得到的:

if(preg_match('/^\p{Lu}[\w%+\/-]+\([0-9]+\)/', $str)){
    error_log('MATCH');
}
else{
    error_log('NO MATCH');
}

This currently does not work. 目前这不起作用。 From what I understand this is what it should do: 据我了解,这是应该做什么:

^\\p{Lu} //match a word beginning with an uppercase letter

[\\w%+\\/-] //with any number of characters following it

+\\([0-9]+\\) //ending with an integer

Where am I going wrong with this? 我在哪里错呢?

The following regex should do it : 以下正则表达式应该做到这一点:

(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)

Explanation 说明

  • (?-i) case-sensitive (?-i)区分大小写
  • (?<=[az]\\s) look-behind for any lower-case letter and space (?<=[az]\\s)向后查找任何小写字母和空格
  • [AZ\\d] match an upper-case letter or digit [AZ\\d]匹配大写字母或数字
  • .*? match any character 匹配任何字符
  • \\(\\d+\\) match any digits including parenthesis \\(\\d+\\)匹配包括括号在内的任何数字

DEMO DEMO

PHP PHP

<?php
$regex = '/(?-i)(?<=[a-z]\s)[A-Z\d].*?\(\d+\)/';
$str = 'A random string with movie titles in it.
       Movies like The Thing (1984) and other titles like Captain America Civil War (2016).
       The movies could be anywhere in this string.
       And some movies like 28 Days Later (2002) could start with a number.';
preg_match_all($regex, $str, $matches);
print_r($matches);
?>

This regex does the job: 这个正则表达式可以完成这项工作:

~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~

Explanation: 说明:

~               : regex delimiter
  (?:           : start non capture group
    [A-Z]       : 1 capital letter, (use \p{Lu} if you want to match title in any language)
    [a-zA-Z]+   : 1 or more letter,  if you want to match title in any language(use \p{L})
    \s+         : 1 or more spaces
   |            : OR
    \d+         : 1 or more digits
    \s+         : 1 or more spaces
  )+            : end group, repeated 1 or more times
  \(\d+\)       : 1 or more digits surrounded by parenthesis, (use \d{4} if the year is always 4 digits)
~               : regex delimiter

Implementation: 执行:

$str = 'A random string with movie titles in it. 
Movies like The Thing (1984) and other titles like Captain America Civil War (2016). 
The movies could be anywhere in this string. 
And some movies like 28 Days Later (2002) could start with a number.';

if (preg_match_all('~(?:[A-Z][a-zA-Z]+\s+|\d+\s+)+\(\d+\)~', $str, $match)) {
    print_r($match);
    error_log('MATCH');
}
else{
    error_log('NO MATCH');
}

Result: 结果:

Array
(
    [0] => Array
        (
            [0] => The Thing (1984)
            [1] => Captain America Civil War (2016)
            [2] => 28 Days Later (2002)
        )

)
MATCH

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM