Extracting Meta Tags from HTML string using only Javascript

Question

I have received the HTML of webpage as a string and I am trying to extract values from within HTML tags contained in the string, more specifically meta tags. I've found ways to do this through jQuery, however the platform I am using does not allow JQuery plus the html I am extracting is technically a string so there is no need for html. I am hoping to extract each meta tag and save them into an array to be used later. Any regex solutions?

var rawHTML=input.rawHTML;
var HTMLlength=rawHTML.length;
var metas=rawHTML.split(">");
var testString="This is a <body>Test String for Regex</body>";
for(var i=0;i<metas.length;i++)
  {
   metas[i]=metas[i]+">";
  }
var twitterResults;
for(var i=0;i<metas.length;i++)
  {
   metas[i]=strip_html_tags(metas[i]);
   //twitterResults = testString.match(<TAG\b[^>]*>(.*?)<);
  }

Most importantly I am trying to do a regex expression to extract these tags as

/<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>

but it seems I can't break out of the regex and won't accept a semi-colon as a semi-colon and just give an error

Answer 1

您可以为此使用正则表达式，但我实际上会将字符串加载到 DOM documentFragment 中，然后通过查找具有nodeName === META的类型1节点来解析meta标记的片段。

Extracting Meta Tags from HTML string using only Javascript

Question

1 answers

solution1
0 2017-10-17 17:32:51

Extracting Meta Tags from HTML string using only Javascript

Question

1 answers

solution1 0 2017-10-17 17:32:51

solution1
0 2017-10-17 17:32:51