從 html 標簽中刪除所有屬性

Question

我有這個 html 代碼：

<p style="padding:0px;">
  <strong style="padding:0;margin:0;">hello</strong>
</p>

如何從所有標簽中刪除屬性？ 我希望它看起來像這樣：

<p>
  <strong>hello</strong>
</p>

Answer 1

改編自我對類似問題的回答

$text = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello</strong></p>';

echo preg_replace("/<([a-z][a-z0-9]*)[^>]*?(\/?)>/si",'<$1$2>', $text);

// <p><strong>hello</strong></p>

RegExp 分解：

/              # Start Pattern
 <             # Match '<' at beginning of tags
 (             # Start Capture Group $1 - Tag Name
  [a-z]        # Match 'a' through 'z'
  [a-z0-9]*    # Match 'a' through 'z' or '0' through '9' zero or more times
 )             # End Capture Group
 [^>]*?        # Match anything other than '>', Zero or More times, not-greedy (wont eat the /)
 (\/?)         # Capture Group $2 - '/' if it is there
 >             # Match '>'
/is            # End Pattern - Case Insensitive & Multi-line ability

添加一些引用，並使用替換文本<$1$2>它應該刪除標記名之后的任何文本，直到標記/>或只是> 。

請注意這不一定適用於所有輸入，因為 Anti-HTML + RegExp 會告訴您。 有一些后備方案，最明顯的是<p style=">">最終會成為<p>">和其他一些損壞的問題......我建議將Zend_Filter_StripTags視為 PHP 中更完整的證明標簽/屬性過濾器

Answer 2

以下是如何使用本機 DOM 執行此操作：

$dom = new DOMDocument;                 // init new DOMDocument
$dom->loadHTML($html);                  // load HTML into it
$xpath = new DOMXPath($dom);            // create a new XPath
$nodes = $xpath->query('//*[@style]');  // Find elements with a style attribute
foreach ($nodes as $node) {              // Iterate over found elements
    $node->removeAttribute('style');    // Remove style attribute
}
echo $dom->saveHTML();                  // output cleaned HTML

如果要從所有可能的標簽中刪除所有可能的屬性，請執行

$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//@*');
foreach ($nodes as $node) {
    $node->parentNode->removeAttribute($node->nodeName);
}
echo $dom->saveHTML();

Answer 3

我會避免使用正則表達式，因為 HTML 不是常規語言，而是使用像Simple HTML DOM這樣的 html 解析器

您可以使用attr獲取對象具有的屬性列表。 例如：

$html = str_get_html('<div id="hello">World</div>');
var_dump($html->find("div", 0)->attr); /
/*
array(1) {
  ["id"]=>
  string(5) "hello"
}
*/

foreach ( $html->find("div", 0)->attr as &$value ){
    $value = null;
}

print $html
//<div>World</div>

Answer 4

$html_text = '<p>Hello <b onclick="alert(123)" style="color: red">world</b>. <i>Its beautiful day.</i></p>';
$strip_text = strip_tags($html_text, '<b>');
$result = preg_replace('/<(\w+)[^>]*>/', '<$1>', $strip_text);
echo $result;

// Result
string 'Hello <b>world</b>. Its beautiful day.'

Answer 5

使用 php 的DOMDocument 類（不帶 xpath）的另一種方法是迭代給定節點上的屬性。 請注意，由於 php 處理DOMNamedNodeMap 類的方式，如果您打算更改集合，則必須向后迭代它。 此行為已在別處討論過，也在文檔注釋中注明。 在刪除或添加元素時，這同樣適用於DOMNodeList 類。 為了安全起見，我總是用這些對象向后迭代。

這是一個簡單的例子：

function scrubAttributes($html, $attributes = []) {
    $dom = new DOMDocument();
    $dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    for ($els = $dom->getElementsByTagname('*'), $i = $els->length - 1; $i >= 0; $i--) {
        for ($attrs = $els->item($i)->attributes, $ii = $attrs->length - 1; $ii >= 0; $ii--) {
            $els->item($i)->removeAttribute($attrs->item($ii)->name);
        }
    }
    return $dom->saveHTML();
}

這是一個演示： https : //3v4l.org/G8VPg

Answer 6

<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>

Answer 7

希望這可以幫助。 這可能不是最快的方法，尤其是對於大塊的 html。 如果有人有任何建議以加快速度，請告訴我。

function StringEx($str, $start, $end)
{ 
    $str_low = strtolower($str);
    $pos_start = strpos($str_low, $start);
    $pos_end = strpos($str_low, $end, ($pos_start + strlen($start)));
    if($pos_end==0) return false;
    if ( ($pos_start !== false) && ($pos_end !== false) )
    {  
        $pos1 = $pos_start + strlen($start);
        $pos2 = $pos_end - $pos1;
        $RData = substr($str, $pos1, $pos2);
        if($RData=='') { return true; }
        return $RData;
    } 
    return false;
}

$S = '<'; $E = '>'; while($RData=StringEx($DATA, $S, $E)) { if($RData==true) {$RData='';} $DATA = str_ireplace($S.$RData.$E, '||||||', $DATA); } $DATA = str_ireplace('||||||', $S.$E, $DATA);

Answer 8

正則表達式對於 HTML 解析來說太脆弱了。 在您的示例中，以下內容將刪除您的屬性：

echo preg_replace(
    "|<(\w+)([^>/]+)?|",
    "<$1",
    "<p style=\"padding:0px;\">\n<strong style=\"padding:0;margin:0;\">hello</strong>\n</p>\n"
);

更新

進行第二次捕獲是可選的，並且不要從結束標簽中刪除“/”：

|<(\\w+)([^>]+)| 到|<(\\w+)([^>/]+)?|

演示這個正則表達式的工作原理：

$ phpsh
Starting php
type 'h' or 'help' to see instructions & features
php> $html = '<p style="padding:0px;"><strong style="padding:0;margin:0;">hello<br/></strong></p>';
php> echo preg_replace("|<(\w+)([^>/]+)?|", "<$1", $html);
<p><strong>hello</strong><br/></p>
php> $html = '<strong>hello</strong>';
php> echo preg_replace("|<(\w+)([^>/]+)?|", "<$1", $html);
<strong>hello</strong>

Answer 9

優化了此問題上評分最高的答案中的正則表達式：

$text = '<div width="5px">a is less than b: a<b, ya know?</div>';

echo preg_replace("/<([a-z][a-z0-9]*)[^<|>]*?(\/?)>/si",'<$1$2>', $text);

// <div>a is less than b: a<b, ya know?</div>

Answer 10

要特別做 andufo 想要的，它很簡單：

$html = preg_replace( "#(<[a-zA-Z0-9]+)[^\>]+>#", "\\1>", $html );

也就是說，他想從開始標簽中去除標簽名稱以外的任何內容。 當然，它不適用於自閉合標簽。

Answer 11

這是擺脫屬性的簡單方法。 它可以很好地處理格式錯誤的 html。

<?php
  $string = '<p style="padding:0px;">
    <strong style="padding:0;margin:0;">hello</strong>
    </p>';

  //get all html elements on a line by themselves
  $string_html_on_lines = str_replace (array("<",">"),array("\n<",">\n"),$string); 

  //find lines starting with a '<' and any letters or numbers upto the first space. throw everything after the space away.
  $string_attribute_free = preg_replace("/\n(<[\w123456]+)\s.+/i","\n$1>",$string_html_on_lines);

  echo $string_attribute_free;
?>

從 html 標簽中刪除所有屬性

問題描述

10 個解決方案

解決方案1
172 已采納 2010-06-11 21:02:26

解決方案2
78 2010-06-11 21:38:54

解決方案3
10 2010-06-11 20:44:38

解決方案4
3 2014-05-24 11:26:07

解決方案5
1 2021-01-15 18:10:04

解決方案6
0 2012-12-16 17:59:34

解決方案7
0 2013-01-04 21:44:21

解決方案8
0 2010-06-11 21:09:04

解決方案9
0 2021-12-01 20:32:32

解決方案10
-1 2012-06-04 00:10:32

解決方案11
-1 2018-05-26 00:00:52

從 html 標簽中刪除所有屬性

問題描述

10 個解決方案

解決方案1 172 已采納 2010-06-11 21:02:26

解決方案2 78 2010-06-11 21:38:54

解決方案3 10 2010-06-11 20:44:38

解決方案4 3 2014-05-24 11:26:07

解決方案5 1 2021-01-15 18:10:04

解決方案6 0 2012-12-16 17:59:34

解決方案7 0 2013-01-04 21:44:21

解決方案8 0 2010-06-11 21:09:04

解決方案9 0 2021-12-01 20:32:32

解決方案10 -1 2012-06-04 00:10:32

解決方案11 -1 2018-05-26 00:00:52

解決方案1
172 已采納 2010-06-11 21:02:26

解決方案2
78 2010-06-11 21:38:54

解決方案3
10 2010-06-11 20:44:38

解決方案4
3 2014-05-24 11:26:07

解決方案5
1 2021-01-15 18:10:04

解決方案6
0 2012-12-16 17:59:34

解決方案7
0 2013-01-04 21:44:21

解決方案8
0 2010-06-11 21:09:04

解決方案9
0 2021-12-01 20:32:32

解決方案10
-1 2012-06-04 00:10:32

解決方案11
-1 2018-05-26 00:00:52