<a>标签替换的</a>正则表达式

Question

I'm new to regular expressions, but I'm trying to learn about it. 我是正则表达式的新手，但我正在尝试学习它。 I want to remove the tag of a html text, and let only the inner text. 我想删除html文本的标记，而只保留内部文本。 Something like that: 像这样：

Original: Lorem ipsum <a href="http://www.google.es">Google</a> Lorem ipsum <a href="http://www.bing.com">Bing</a>
Result:  Lorem ipsum Google Lorem ipsum Bing

I'm using this code: 我正在使用此代码：

$patterns = array( "/(<a href=\"[a-z0-9.:_\-\/]{1,}\">)/i", "/<\/a>/i");
$replacements = array("", "");

$text = 'Lorem ipsum <a href="http://www.google.es">Google</a> Lorem ipsum <a href="http://www.bing.com">Bing</a>';
$text = preg_replace($patterns,$replacements,$text);

It works, but I don't know if this code is the more efficient or the more readable. 它可以工作，但是我不知道这段代码是更有效还是更易读。

Can I improve the code in some way? 我可以通过某种方式改进代码吗？

Answer 1

In your case, PHP's strip_tags() should do exactly what you need without regular expressions. 在您的情况下，PHP的strip_tags()应该可以完全满足您的需要，而无需使用正则表达式。 If you want to strip only a specific tag (something strip_tags() can't do by default), there is a function in the User Contributed Notes . 如果您只想剥离特定标签（默认情况下， strip_tags()不能执行此操作），则User Contributed Notes中有一个功能。

In general, regexes are not suitable for parsing HTML. 通常，正则表达式不适合解析HTML。 It's better to use a DOM parser like Simple HTML DOM or one of PHP's built-in parsers . 最好使用诸如Simple HTML DOM之类的DOM解析器或PHP的内置解析器之一。

Answer 2

不要使用正则表达式，而应使用DOM解析器。

Answer 3

If your content only contains anchor tags, then strip_tags is probably easier to use. 如果您的内容仅包含锚标记，则strip_tags可能更易于使用。

Your preg_replace won't replace if there are spurious spaces between a and href, or if there are any other attributes in the tag. 如果a和href之间有多余的空格，或者标签中包含其他任何属性，则不会替换您的preg_replace。

Answer 4

In this case, using regex is not a good idea. 在这种情况下，使用正则表达式不是一个好主意。 Having said that: 话说回来：

<?php
    $text = 'Lorem ipsum <a href="http://www.google.es">Google</a> Lorem ipsum <a href="http://www.bing.com">Bing</a>';
    $text = preg_replace(
        '@\\<a\\b[^\\>]*\\>(.*?)\\<\\/a\\b[^\\>]*\\>@',
        '\\1',
        $text
    );
    echo $text;
    // Lorem ipsum Google Lorem ipsum Bing
?>

This is a very trivial regex, its not bullet proof. 这是一个非常琐碎的正则表达式，不是证明。

Answer 5

您无法使用正则表达式解析[X] HTML。

<a>标签替换的</a>正则表达式

问题描述

5 个解决方案

解决方案1
7 已采纳 2010-08-03 11:01:41

解决方案2
5 2010-08-03 11:02:19

解决方案3
2 2010-08-03 11:03:14

解决方案4
2 2010-08-03 11:47:09

解决方案5
0 2010-08-03 11:04:23

<a>标签替换的</a>正则表达式

问题描述

5 个解决方案

解决方案1 7 已采纳 2010-08-03 11:01:41

解决方案2 5 2010-08-03 11:02:19

解决方案3 2 2010-08-03 11:03:14

解决方案4 2 2010-08-03 11:47:09

解决方案5 0 2010-08-03 11:04:23

解决方案1
7 已采纳 2010-08-03 11:01:41

解决方案2
5 2010-08-03 11:02:19

解决方案3
2 2010-08-03 11:03:14

解决方案4
2 2010-08-03 11:47:09

解决方案5
0 2010-08-03 11:04:23