[英]How to find all the links in a webpage that has a specific file extension?
Is it possible to find a href
in a website that has a certain file extension. 是否可以在具有特定文件扩展名的网站中找到
href
。 for example it would print http://www.test.com/something.mp3
http://www.test.com/somelinktoamuscifile.mp3
http://www.test.com/music.mp3
. 例如,它将打印
http://www.test.com/something.mp3
http://www.test.com/somelinktoamuscifile.mp3
http://www.test.com/music.mp3
。
It would show all of links, with a file extension of .mp3
for example. 它将显示所有链接,例如文件扩展名为
.mp3
。
would you do var extension = ".mp3"
你会做
var extension = ".mp3"
吗?
var checker = url + extension
if(url == checker){console.log(url);}
So you want to extract all links that contain a certain string from any given url? 所以你想从任何给定的URL中提取包含某个字符串的所有链接? Maybe this script will help you:
也许这个脚本会帮助你:
var request = require('request');
var cheerio = require('cheerio');
var url = "http://www.stackoverflow.com";
var toFind = "delete" //use file extension or whatever you want to find
request(url, function(err, resp, body) {
if (err) throw err;
var $ = cheerio.load(body);
$('a').each(function (i, element) {
var a = $(this);
//console.log(a.attr('href'));
var href = a.attr('href');
if (href && href.indexOf(toFind) != -1) {
console.log(href);
}
})
})
Output:
$ node scraping.js
http://ux.stackexchange.com/questions/49991/should-yes-delete-it-be-red-or-green
Just change the content of url
and toFind
. 只需更改
url
和toFind
的内容toFind
。 There is a good tutorial on web scraping here and here . 这里和这里有一个关于网络抓取的好教程。 Of course this can be done in a lot of different programming languages.
当然,这可以在许多不同的编程语言中完成。 I merely used javascript because you tagged it that way.
我只是使用了javascript,因为你用这种方式标记了它。
Here is a native javascript solution that works in current browsers (IE8+, Chrome, Firefox) without jQuery. 这是一个原生的JavaScript解决方案,适用于没有jQuery的当前浏览器(IE8 +,Chrome,Firefox)。
function getLinksWithExtension(extension) {
var links = document.querySelectorAll('a[href$="' + extension + '"]'),
i;
for (i=0; i<links.length; i++){
console.log(links[i]);
}
}
I think it goes like this: 我认为它是这样的:
var mp3_extension = '.mp3';
var url_string = url.split('.');
var url_extension = url_string[url_string.length-1];
if(url_extension === mp3_extension){
//go go go!!!
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.