除去包含 class 的標簽之外的 html 標簽

Question

我需要去除 HTML 標簽的正則表達式，除了包含 class “classmark”的“a”標簽

假設我有這個 HTML 字符串：

 <b>this</b>
 <a href="#">not match</a>
 <a href="#" target="_blank">not match</a>
 <a href="#" class="classmark" target="_blank">match</a>
 <a href="#" class="classmark">match2</a>
 <a class="classmark" target="_blank">match3</a>
 <a class="classmark">match4</a>
 <b>this</b>
 <p>fggfgf</p>

我想要這樣的結果：

this
not match
not match
<a href="#" class="classmark" target="_blank">match</a>
<a href="#" class="classmark">match2</a>
<a class="classmark" target="_blank">match3</a>
<a class="classmark">match4</a>
this
fggfgf

我用這個 function 剝離 HTML 標簽

 function strip_tags( _html /*you can put each single tag per argument*/ )
{   
    var _tags = [], _tag = "" ;

    for( var _a = 1 ; _a < arguments.length ; _a++ )
   {
    _tag = arguments[_a].replace( /<|>/g, '' ).trim() ;
    
    if ( arguments[_a].length > 0 ) _tags.push( _tag, "/"+_tag );
   }

   if ( !( typeof _html == "string" ) && !( _html instanceof String ) ) return "" ;
   else if ( _tags.length == 0 )
   { 
    return _html.replace( /<(\s*\/?)[^>]+>/g, "" );

   }
   else
   {  
    var _re = new RegExp( "<(?!("+_tags.join("|")+")\s*\/?)[^>]+>", "g" );
    return _html.replace( _re, '');
   }

 }

它將去除 HTML 標簽並只保留我想要相同的 function 的特定標簽，並添加我需要這樣的 class 屬性：

    strip_tags( HTMLstring , "a" ,"classmark")

Answer 1

~~如果我理解正確，您可以使用正則表達式來測試 html 是否包含帶有 class 屬性y的標簽x ，然后您可以使用.replace(regex, ...)調用剝離標簽。~~ ~~這可能是這樣的：~~

[removed]

編輯：

好的，誤解並認為它是單個 html 標簽的數組。 所以這個版本首先將它們分成匹配的 html 標簽（注意這個版本不會做嵌套標簽），然后映射所有部分並替換每個部分。 然后加入他們：

function strip_tags(_html, _tag, _class) {
  return _html
    // Match each tag and return them as an array of matches
    .match(/<(.+).*?>.*?<\/\1>(.*?)([^<]*)/g) 
    // Map over each tag and check if it is a specific tag with a specific class
    .map(tag => {
      const regex = RegExp(`<${_tag} (.*?)class="${_class}"(.*?)>`);
      // If it is, replace the tag part within nothing, and leave the content
      if (!regex.test(tag)) {
        return tag.replace(/(<([^>]+)>)/gi, '');
      // If not then just return the tag as is 
      } else {
        return tag;
      }
    })
    // Now join all the mapped tags back together
    .join('');
}

編輯：

如果您想通過每個元素使用正確的 HTML 解析器和 go ，那么您可以查看DOMParser和此鏈接以開始

let parser = new DOMParser()
let doc = parser.parseFromString(str, "text/html")
doc
  .querySelectorAll('*')
  .forEach(node => {
    console.log(node);
  });

除去包含 class 的標簽之外的 html 標簽

問題描述

1 個解決方案

解決方案1
0 2020-08-21 10:21:28

除去包含 class 的標簽之外的 html 標簽

問題描述

1 個解決方案

解決方案1 0 2020-08-21 10:21:28

解決方案1
0 2020-08-21 10:21:28