简体   繁体   English

如何从PHP中的HTML标签的属性中删除标签?

[英]How to remove tags from attribute of HTML tag in php?

I have a large amount of post generated with old CMS. 我用旧的CMS生成了大量的帖子。 It is in HTML markup...almost...the worse of I ever seen before. 它是HTML标记......几乎......我以前见过的更糟糕的事情。 It contains such constructs: 它包含这样的结构:

....<IMG alt="Хит сезона - <b>Лучшие фразы...</b>" src="http://www.example.com/articles/pic.jpg" align=left>...

As you can see strictly it is not a HTML, becouse it contains tegs inside tag attributes. 正如您所看到的那样,它不是HTML,因为它包含标记属性中的tegs。

I need to remove any tags from HTML attributes. 我需要从HTML属性中删除任何标记。

I had tried to use parsing through DOMDocument, but it cannot output cyrilic symbols correctly if headers body and html are not in parsed string . 我试图通过DOMDocument使用解析,但如果标题body和html不在解析的字符串中 ,它就无法正确输出cyrilic符号 And even if it does I have to remove them from output. 即使它确实如此,我必须从输出中删除它们。

The question is how to remove tags from attribute of HTML tag in PHP? 问题是如何从PHP中的HTML标签的属性中删除标签?

Is preg_replace is suitable for this? preg_replace适合这个吗?

You could try this: 你可以试试这个:

preg_replace('#<([^ ]+)((\s+[\w]+=((["\'])[^\5]+\5|[^ ]+))+)>#e', '"<\\1" . str_replace("\\\'", "\'", strip_tags("\\2")) . ">"', $code);

It takes every html opening tag ( <something> ), matches all the attributes name="value" name='value' name=value then it tag-strips them. 它需要每个html开始标记( <something> ),匹配所有属性name="value" name='value' name=value然后它标记剥离它们。 The str_replace is necessary because when the e modifier is added, PHP use addslashes to every match before evaluating it. str_replace是必需的,因为当添加e修饰符时,PHP会在评估之前对每个匹配使用addslashes

I tested it and it seems to work fine. 我测试了它,似乎工作正常。 :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM