正则表达式匹配包含在 HTML 跨度标签中的多个单词

Question

I'm working on a regex to match phrases in a HTML string.我正在使用正则表达式来匹配 HTML 字符串中的短语。 For example, I want to find every instance of "artificial intelligence" and return the <span> tag that immediately precedes it.例如，我想找到“人工智能”的每个实例并返回紧跟在它前面的<span>标签。

The trouble I have is that the my regex only returns one large match.我遇到的麻烦是我的正则表达式只返回一个大匹配。

Here is a link to an online regex builder I've been using: https://regex101.com/r/rK9yO9/1这是我一直在使用的在线正则表达式构建器的链接： https : //regex101.com/r/rK9yO9/1

I am looking to return the following two matches:我希望返回以下两个匹配项：

<span m='3'>
<span m='13'>

Example string:示例字符串：

<p><span m='2'>of</span> <span m='3'>artificial</span> 
<span m='4'>intelligence.</span><span m='4'>So</span> 
<span m='5'>that</span> <span m='6'>seems</span> 
<span m='9'>good.</span> <span m='10'>The</span> 
<span m='11'>impact</span> <span m='12'>of</span> 
<span m='13'>artificial</span> <span m='14'>intelligence,</span> 
<span m='15'>on</span> </p>

Nb there are no newlines in the text, I added those for readability.注意文本中没有换行符，我添加了那些以提高可读性。

The regex I have so far is:到目前为止我拥有的正则表达式是：

(<span.*>)artificial.?<\\/span>.?<span.*>intelligence.?<\\/span>

Which returns the following match:返回以下匹配项：

<span m='2'>of</span> <span m='3'>artificial</span> 
<span m='4'>intelligence.</span><span m='4'>So</span> 
<span m='5'>that</span> <span m='6'>seems</span> 
<span m='9'>good.</span> <span m='10'>The</span> 
<span m='11'>impact</span> <span m='12'>of</span> 
<span m='13'>artificial</span> <span m='14'>intelligence,</span>

Answer 1

You are using greedy regex.您正在使用贪婪的正则表达式。 To make matching stop at first occurrence use ?要使匹配在第一次出现时停止，请使用 ?

(<span.*?>)artificial.?<\/span>.?<span.*?>intelligence.?<\/span>

will match会匹配

'<span m='2'>of</span> <span m='3'>artificial</span> <span m='4'>intelligence.</span>'

you can easily get the first group matched您可以轻松匹配第一组

Answer 2

Try this regex:试试这个正则表达式：

 /(<span[^<]+?>(?:artificial|intelligenc\.)<\/span>)/gm

See DEMO见演示

It should match only selected tags它应该只匹配选定的标签

正则表达式匹配包含在 HTML 跨度标签中的多个单词

问题描述

2 个解决方案

解决方案1
2 2016-03-22 10:37:08

解决方案2
1 已采纳 2016-03-22 11:00:39

正则表达式匹配包含在 HTML 跨度标签中的多个单词

问题描述

2 个解决方案

解决方案1 2 2016-03-22 10:37:08

解决方案2 1 已采纳 2016-03-22 11:00:39

解决方案1
2 2016-03-22 10:37:08

解决方案2
1 已采纳 2016-03-22 11:00:39