正则表达式模式匹配

Question

I want to extract from a string containing html content, text between the first occurrence of ( <a> and <span> tags). 我想从包含html内容的字符串中提取第一次出现的（ <a>和<span>标签）之间的文本。

My pattern is as following : 我的模式如下：

$pattern='/<a[^(span)][\/\(\)-:@!%*>#=_|?$&";.\w\s]+<\/a> <span/um';

I get the output as text between 1st occurrence of <a and last occurrence of <span and not text between 1st occurrence of both. 我得到的输出是第一次出现的<a和最后一次出现的<span之间的文本，而不是两次出现的第一次之间的文本。

eg, html content: 例如html内容：

<a href="#">asdasdasd</a> <span blah blah></span> blah blah <a>blah  </a> <span>blah

Want: 想：

<a href="#">asdasdasd</a> <span

Getting: 获得：

<a href="#">asdasdasd</a> <span blah blah></span> blah blah <a>blah  </a> <span

Answer 1

Use a HTML parser for parsing HTML 使用HTML解析器解析HTML
Use lazy quantifier '/<a[^(span)][\\/\$\$-:@!%*>#=_|?$&";.\\w\\s]+?<\\/a> <span/um'; 使用惰性量词'/<a[^(span)][\\/\$\$-:@!%*>#=_|?$&";.\\w\\s]+?<\\/a> <span/um';

Answer 2

You need to make the regular expression lazy rather than greedy by telling it to match as few characters between <a and <span as possible with .+? 您需要通过告诉正则表达式使<a和<span之间的字符尽可能少地匹配.+? : ：

$ptn = '/<a.+?<span/';
$str = '<a href="#">asdasdasd</a> <span blah blah></span> blah blah <a>blah  </a> <span>blah';
preg_match($ptn, $str, $matches);
echo $matches[0];

The result is <a href=\\"#\\">asdasdasd</a> <span 结果为<a href=\\"#\\">asdasdasd</a> <span

正则表达式模式匹配

问题描述

2 个解决方案

解决方案1
1 已采纳 2012-10-21 00:06:50

解决方案2
0 2012-10-21 00:15:35

正则表达式模式匹配

问题描述

2 个解决方案

解决方案1 1 已采纳 2012-10-21 00:06:50

解决方案2 0 2012-10-21 00:15:35

解决方案1
1 已采纳 2012-10-21 00:06:50

解决方案2
0 2012-10-21 00:15:35