[英]PHP regular expression help
I am using preg_replace to strip out <p>
tags and <li>
tags and making them carriage returns. 我正在使用preg_replace去除<p>
标签和<li>
标签,并使它们回车。 I have some <a
> tags in my string, and I want to strip those out, but keep the href attribute. 我的字符串中有一些<a
标记,我想删除它们,但保留href属性。 For instance, if I have: <a href = "http://www.example.com">Click Here</a>
, what I want is: http://www.example.com
Click Here 例如,如果我有: <a href = "http://www.example.com">Click Here</a>
,我想要的是: http://www.example.com
: http://www.example.com
单击此处
Here is what I have so far 这是我到目前为止的
$text .= preg_replace(array("/<p[^>]*>/iU","/<\/p[^>]*>/iU","/<ul[^>]*>/iU","/<\/ul[^>]*>/iU","/<li[^>]*>/iU","/<\/li[^>]*>/iU"), array("","\r\n\r\n","","\r\n\r\n","","\r\n"), $content);
Thanks 谢谢
If I were you I would use SimpleHTMLDom . 如果我是你,我将使用SimpleHTMLDom 。 Here's a usage example from the docs: 这是文档中的用法示例:
// Create DOM from string
$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');
$html->find('div', 1)->class = 'bar';
$html->find('div[id=hello]', 0)->innertext = 'foo';
echo $html;
// Output: <div id="hello">foo</div><div id="world" class="bar">World</div>
If a regex solution is desired, here is a tested function which handles the anchor tags as you requested (with notable caveats noted below.) The regex is presented in verbose mode with comments: 如果需要正则表达式解决方案,则下面是一个经过测试的函数,可以根据您的要求处理锚标记(以下为值得注意的注意事项。)正则表达式以详细模式显示,并带有注释:
function process_markup($content) {
return preg_replace(
array( // Regex patterns
'%<(?:p|ul|li)[^>]*>%i', // Open tags.
'%<\/(?:p|ul|li)[^>]*>\s*%i', // Close tags.
'% # Match A element (with no "<>" in attributes!)
<a\b # Start tag name.
[^>]+? # anything up to HREF attribute.
href\s*=\s* # HREF attribute name and "="
(["\']?) # $1: Optional quote delimiter
([^>\s]+) # $2: HREF attribute value.
(?(1)\1) # If open quote, match close quote.
[^>]*> # Remainder of start tag
(.*?) # $3: A element contents.
</a\s*> # A element end tag.
%ix'
),
array( // Replacement strings
"", # Simply strip P, UL, and LI open tags.
"\r\n", # Replace close tags with line endings.
"$2 $3" # Keep A element HREF value and contents.
), $content);
}
I took the liberty of modifying the other regexes as well. 我也自由地修改了其他正则表达式。 Adjust as necessary. 根据需要进行调整。
CAVEATS: This regex solution assumes: All A
, P
, UL
and LI
elements have no angle brackets <>
in their attributes. 注释:此正则表达式解决方案假定:所有A
, P
, UL
和LI
元素的属性中都没有尖括号<>
。 There are no A
, P
, UL
or LI
element start or end tags within any CDATA
sections such as SCRIPT
or STYLE
elements, or HTML comments, or inside other start tag attributes. 在任何CDATA
节(例如SCRIPT
或STYLE
元素)或HTML注释中,或在其他开始标记属性内,都没有A
, P
, UL
或LI
元素的开始或结束标记。 Otherwise, this should work pretty well for a lot of HTML markup. 否则,这对于许多HTML标记来说应该可以很好地工作。
I realize that many wince when they hear the words: HTML
and REGEX
spoken in the same breath, but in this particular case, I think a regex solution will work quite well (within the above limitations). 我意识到,很多人听到相同的话时都会感到REGEX
: HTML
和REGEX
是同时呼吸的,但是在这种特殊情况下,我认为regex解决方案会很好地工作(在上述限制内)。 The A
tag is one of those which is not nested , so a regex can easily match the start tag, contents and end tag all in one whack. A
标签是未嵌套 A
标签之一,因此正则表达式可以轻松地将开始标签,内容和结束标签全部匹配在一起。 Same thing with the individual start and end tags for the other elements (which can be nested) when considered independently. 独立考虑其他元素( 可以嵌套)的单个开始标签和结束标签的情况相同。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.