简体   繁体   English

正则表达式模式匹配

[英]Regular expression pattern match

I want to extract from a string containing html content, text between the first occurrence of ( <a> and <span> tags). 我想从包含html内容的字符串中提取第一次出现的( <a><span>标签)之间的文本。

My pattern is as following : 我的模式如下:

$pattern='/<a[^(span)][\/\(\)-:@!%*>#=_|?$&";.\w\s]+<\/a> <span/um';

I get the output as text between 1st occurrence of <a and last occurrence of <span and not text between 1st occurrence of both. 我得到的输出是第一次出现的<a和最后一次出现的<span之间的文本,而不是两次出现的第一次之间的文本。

eg, html content: 例如html内容:

<a href="#">asdasdasd</a> <span blah blah></span> blah blah <a>blah  </a> <span>blah

Want: 想:

<a href="#">asdasdasd</a> <span

Getting: 获得:

<a href="#">asdasdasd</a> <span blah blah></span> blah blah <a>blah  </a> <span
  1. Use a HTML parser for parsing HTML 使用HTML解析器解析HTML
  2. Use lazy quantifier '/<a[^(span)][\\/\\(\\)-:@!%*>#=_|?$&";.\\w\\s]+?<\\/a> <span/um'; 使用惰性量词'/<a[^(span)][\\/\\(\\)-:@!%*>#=_|?$&";.\\w\\s]+?<\\/a> <span/um';

You need to make the regular expression lazy rather than greedy by telling it to match as few characters between <a and <span as possible with .+? 您需要通过告诉正则表达式使<a<span之间的字符尽可能少地匹配.+? :

$ptn = '/<a.+?<span/';
$str = '<a href="#">asdasdasd</a> <span blah blah></span> blah blah <a>blah  </a> <span>blah';
preg_match($ptn, $str, $matches);
echo $matches[0];

The result is <a href=\\"#\\">asdasdasd</a> <span 结果为<a href=\\"#\\">asdasdasd</a> <span

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM