I have received the HTML of webpage as a string and I am trying to extract values from within HTML tags contained in the string, more specifically meta tags. I've found ways to do this through jQuery, however the platform I am using does not allow JQuery plus the html I am extracting is technically a string so there is no need for html. I am hoping to extract each meta tag and save them into an array to be used later. Any regex solutions?
var rawHTML=input.rawHTML;
var HTMLlength=rawHTML.length;
var metas=rawHTML.split(">");
var testString="This is a <body>Test String for Regex</body>";
for(var i=0;i<metas.length;i++)
{
metas[i]=metas[i]+">";
}
var twitterResults;
for(var i=0;i<metas.length;i++)
{
metas[i]=strip_html_tags(metas[i]);
//twitterResults = testString.match(<TAG\b[^>]*>(.*?)<);
}
Most importantly I am trying to do a regex expression to extract these tags as
/<([A-Z][A-Z0-9]*)\b[^>]*>(.*?)</\1>
but it seems I can't break out of the regex and won't accept a semi-colon as a semi-colon and just give an error
您可以为此使用正则表达式,但我实际上会将字符串加载到 DOM documentFragment 中,然后通过查找具有nodeName === META
的类型1
节点来解析meta
标记的片段。
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.