不返回正则表达式的多个匹配项

Question

I am using TinyMCE and it is converting all my attribute single quotes to double quotes on cleanup.我正在使用 TinyMCE，它在清理时将我所有的属性单引号转换为双引号。

This is what I am putting into the editor.这就是我要放入编辑器的内容。

<tr _excel-dimensions='{"row":{"rowHeight":50}}'>
<td _excel-styles='{"font":{"size":20,"color":{"rgb":"333333"},"bold":true},"fill":{"fillType":"solid","startColor":"F0F0F0"},"alignment":{"horizontal":"center"}}' colspan='6'>Affiliate Accounts</td>
</tr>

and this is what the editor does after saving it这是编辑器保存后所做的

<tr _excel-dimensions="{&quot;row&quot;:{&quot;rowHeight&quot;:50}}">
<td _excel-styles="{&quot;font&quot;:{&quot;size&quot;:20,&quot;color&quot;:{&quot;rgb&quot;:&quot;333333&quot;},&quot;bold&quot;:true},&quot;fill&quot;:{&quot;fillType&quot;:&quot;solid&quot;,&quot;startColor&quot;:&quot;F0F0F0&quot;},&quot;alignment&quot;:{&quot;horizontal&quot;:&quot;center&quot;}}" colspan="6">Accounts</td>
</tr>

There doesn't seem to be a way to override the setting in TinyMCE.似乎没有办法覆盖 TinyMCE 中的设置。 So I am turning to RegEx with PHP when saving the data to the database.所以我在将数据保存到数据库时转向使用 PHP 的 RegEx。 This is what I have so far, but doesn't seem to be capturing all the double quotes.这是我到目前为止所拥有的，但似乎并没有捕获所有的双引号。

$content = preg_replace_callback('/<(.*)(\")(.*)(\")(.*)>/miU', function($matches) {
  return "<" . $matches[1] . "'" . html_entity_decode($matches[3]) . "'" . $matches[5] . ">";
}, $content);

It is replacing the json encoded string, but not the colspan="6"它正在替换 json 编码的字符串，但不是 colspan="6"

Thanks in advance for the help.在此先感谢您的帮助。

Answer 1

As I said in the comment, it's not very good to parse HTML with regex, better to use special libraries like PHP Simple HTML DOM Parser .正如我在评论中所说，用正则表达式解析 HTML 不是很好，最好使用像PHP Simple HTML DOM Parser这样的特殊库。 However it's possible to construct a regex which will work on a correct HTML.但是，可以构建一个适用于正确 HTML 的正则表达式。

Our goal is to find all double-quoted strings inside a tag.我们的目标是在标签内找到所有双引号字符串。 First let's forget about requirement that the double-quoted string must be inside a tag.首先让我们忘记双引号字符串必须在标签内的要求。 Then we can use this:然后我们可以使用这个：

$content = preg_replace_callback('/"(.*?)"/', 
  function($matches) {
    return "'" . html_entity_decode($matches[1]) . "'" 
  }, 
  $content);

Now we need to add the check that the double-quoted string is inside a tag.现在我们需要添加双引号字符串是否在标签内的检查。 To do this we construct a lookahead expression which checks the text between our double-quoted string and the end of the text:为此，我们构建了一个先行表达式，用于检查双引号字符串和文本结尾之间的文本：

there must be a tag-closing > there.必须有一个标签关闭>那里。 It means that there must be some sequence of non- < , non- > characters followed by > .这意味着必须有一些非< 、非>字符后跟>序列。 The corresponding regex is [^<>]*>对应的正则表达式为[^<>]*>
it must be followed by any number of complete tags < and > .它后面必须跟有任意数量的完整标签<和> 。 The regex for a group of characters containing a single tag is [^<]*<[^>]*> .包含单个标签的一组字符的正则表达式是[^<]*<[^>]*> 。 We need to repeat this group any number of times: (?:[^<]*<[^>]*>)*我们需要多次重复这个组： (?:[^<]*<[^>]*>)*
there might be some non- < , non- > characters till the end of the text: [^<>]*$可能有一些非< 、非>字符直到文本结束： [^<>]*$

The resulting lookahead expression looks a bit terrifying, but does the work: (?=[^<>]*>(?:[^<]*<[^>]*>)*[^<>]*$) .生成的前瞻表达式看起来有点吓人，但确实有效： (?=[^<>]*>(?:[^<]*<[^>]*>)*[^<>]*$) 。

Finally, we incorporate this lookahead check into the original regex:最后，我们将这个先行检查合并到原始正则表达式中：

$content = preg_replace_callback('/"(?=[^<>]*>(?:[^<]*<[^>]*>)*[^<>]*$)(.*?)"/', 
  function($matches) {
    return "'" . html_entity_decode($matches[1]) . "'" 
  }, 
  $content);

You can check it here: Regex101 demo您可以在此处查看： Regex101 演示

不返回正则表达式的多个匹配项

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-02-27 08:14:15

不返回正则表达式的多个匹配项

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-02-27 08:14:15

解决方案1
1 已采纳 2020-02-27 08:14:15