简体   繁体   中英

Extracting Meta Tags from HTML string using only Javascript

I have received the HTML of webpage as a string and I am trying to extract values from within HTML tags contained in the string, more specifically meta tags. I've found ways to do this through jQuery, however the platform I am using does not allow JQuery plus the html I am extracting is technically a string so there is no need for html. I am hoping to extract each meta tag and save them into an array to be used later. Any regex solutions?

var rawHTML=input.rawHTML;
var HTMLlength=rawHTML.length;
var metas=rawHTML.split(">");
var testString="This is a <body>Test String for Regex</body>";
for(var i=0;i<metas.length;i++)
  {
   metas[i]=metas[i]+">";
  }
var twitterResults;
for(var i=0;i<metas.length;i++)
  {
   metas[i]=strip_html_tags(metas[i]);
   //twitterResults = testString.match(<TAG\b[^>]*>(.*?)<);
  }

Most importantly I am trying to do a regex expression to extract these tags as

/<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>

but it seems I can't break out of the regex and won't accept a semi-colon as a semi-colon and just give an error

您可以为此使用正则表达式,但我实际上会将字符串加载到 DOM documentFragment 中,然后通过查找具有nodeName === META的类型1节点来解析meta标记的片段。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM