[英]using javascript regex, get meta tags data from a web page
I wanna get meta tags data using javascript(jquery) and regex. 我想使用javascript(jquery)和regex获取元标记数据。
Here are some meta tags. 这是一些元标记。
<meta name="description" content="Amazon.com : Google Chromecast HDMI Streaming Media Player : Streaming Media Clients : Electronics" />
<meta name="title" content="Amazon.com : Google Chromecast HDMI Streaming Media Player : Streaming Media Clients : Electronics" />
I can get content from those forms using below function. 我可以使用以下功能从这些表格中获取内容。
function getProductInfo(attr) {
var m = $("meta[name="+attr+"]");
var content = m.attr("content");
return content;
}
if(!title) var title = getProductInfo('title');
However sometimes there are different form of meta tags like 但是,有时会有不同形式的元标记,例如
<meta property="title" content="....">
<meta property="og:title" content="....">
<meta name="title" description="....">
That's why I'm considering using regex. 这就是为什么我正在考虑使用正则表达式。 but I have no idea.
但我不知道 please give me a tip.
请给我小费。 thanks.
谢谢。
$('meta').each(function() {
console.log($(this).attr('content'));
});
No need for regex, for unsorted tags. 无需正则表达式,无需排序标签。
function getProductInfo(attr) {
$('meta').each(function(index,tag) {
if($(tag)[0].attributes[0].textContent == attr) {
console.log($(tag)[0].attributes[0].textContent, $(tag)[0].attributes[1].textContent);
}
});
}
getProductInfo('title');
This will get you anything , that has title in name/property. 这将为您提供任何带有名称/属性标题的东西。
Here's how you do it by not using RegEx 不使用RegEx的方法如下
No libraries, pure vanilla JS: 没有库,纯香草JS:
var meta = document.querySelectorAll('meta');
for(var i=0;i<meta.length;i++){
var content = meta[i].getAttribute('content'); /* here's the content */
}
http://jsfiddle.net/JA9Yq/ http://jsfiddle.net/JA9Yq/
jQuery: jQuery的:
$('meta').each(function(index,tag){
var content = tag.getAttribute('content');
});
Its also possible using RegEx: 使用RegEx也可以:
<meta[^>]+content="([^")]*)"
Result: 结果:
$matches Array:
(
[0] => Array
(
[0] => <meta name="description" content="Amazon.com : Google Chromecast HDMI Streaming Media Player : Streaming Media Clients : Electronics"
[1] => <meta name="title" content="Amazon.com : Google Chromecast HDMI Streaming Media Player : Streaming Media Clients : Electronics"
)
[1] => Array
(
[0] => Amazon.com : Google Chromecast HDMI Streaming Media Player : Streaming Media Clients : Electronics
[1] => Amazon.com : Google Chromecast HDMI Streaming Media Player : Streaming Media Clients : Electronics
)
)
may be this.. 可能是这个
var desc = $('meta[name=description]').attr("content");
var title= $('meta[name=title]').attr("content");
var desc = $('meta[property=description]').attr("content");
var title= $("meta[property='og:title]").attr("content");
note: Apparently it doesn't like the colon. 注意:显然,它不喜欢冒号。 I was able to fix it by using double and single quotes like this:
我可以通过使用双引号和单引号来修复它,如下所示:
This should work on all meta tags I think... 这应该适用于我认为的所有meta标签...
/\<meta.*?\>/
Here's a link to regexr, which is a good tool to try things out 这是regexr的链接,这是尝试操作的好工具
http://gskinner.com/RegExr http://gskinner.com/RegExr
But it's better to use @subZero advice and not use regex if you don't have to... I think. 但是最好使用@subZero建议,而不要使用regex,如果我不必...我想。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.