简体   繁体   English

如何查找具有特定文件扩展名的网页中的所有链接?

[英]How to find all the links in a webpage that has a specific file extension?

Is it possible to find a href in a website that has a certain file extension. 是否可以在具有特定文件扩展名的网站中找到href for example it would print http://www.test.com/something.mp3 http://www.test.com/somelinktoamuscifile.mp3 http://www.test.com/music.mp3 . 例如,它将打印http://www.test.com/something.mp3 http://www.test.com/somelinktoamuscifile.mp3 http://www.test.com/music.mp3

It would show all of links, with a file extension of .mp3 for example. 它将显示所有链接,例如文件扩展名为.mp3

would you do var extension = ".mp3" 你会做var extension = ".mp3"吗?

var checker = url + extension

if(url == checker){console.log(url);}

So you want to extract all links that contain a certain string from any given url? 所以你想从任何给定的URL中提取包含某个字符串的所有链接? Maybe this script will help you: 也许这个脚本会帮助你:

var request = require('request');
var cheerio = require('cheerio');

var url = "http://www.stackoverflow.com";
var toFind = "delete"  //use file extension or whatever you want to find

request(url, function(err, resp, body) {
    if (err) throw err;
    var $ = cheerio.load(body);

    $('a').each(function (i, element) {
        var a = $(this);
        //console.log(a.attr('href'));

        var href = a.attr('href');
        if (href && href.indexOf(toFind) != -1) {
            console.log(href);
        }
    })
})

Output:
$ node scraping.js 
http://ux.stackexchange.com/questions/49991/should-yes-delete-it-be-red-or-green

Just change the content of url and toFind . 只需更改urltoFind的内容toFind There is a good tutorial on web scraping here and here . 这里这里有一个关于网络抓取的好教程。 Of course this can be done in a lot of different programming languages. 当然,这可以在许多不同的编程语言中完成。 I merely used javascript because you tagged it that way. 我只是使用了javascript,因为你用这种方式标记了它。

Here is a native javascript solution that works in current browsers (IE8+, Chrome, Firefox) without jQuery. 这是一个原生的JavaScript解决方案,适用于没有jQuery的当前浏览器(IE8 +,Chrome,Firefox)。

function getLinksWithExtension(extension) {
    var links = document.querySelectorAll('a[href$="' + extension + '"]'),
        i;

    for (i=0; i<links.length; i++){
        console.log(links[i]);
    }
}

I think it goes like this: 我认为它是这样的:

var mp3_extension = '.mp3';
var url_string = url.split('.');
var url_extension = url_string[url_string.length-1];

if(url_extension === mp3_extension){

    //go go go!!!     

}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将活动类添加到菜单链接正在使用仅不具有扩展名的链接,如何使它查找具有或不具有灭绝的所有链接? - Add class active to menu links is working with links that has no extension only, how to make it find all links with and without extinctions? Google Chrome扩展程序可突出显示并提醒网页上的所有链接 - Google Chrome Extension to highlight and alert all links on a webpage 如何使用 Javascript 替换网页中的特定链接 - How to replace specific links in a webpage using Javascript 如何使用 querySelector 查找具有特定类的所有外部链接 - How to use querySelector to find all external links with a specific class 如何将网页的所有链接转换为短URL? - How to convert all Links of a webpage into short URLs? 提取网页中的所有链接 - Extracting all links in a webpage Python:如何访问网页、单击特定链接并将其中的数据复制到文本文件中? - Python: How to access a webpage, click specific links and copy the data within them to a text file? 如何修改Chrome扩展名页面上的所有链接? - How to modify all links on page with chrome extension? 如何在外部网页中查找特定内容 - How to find specific content within an external webpage 点击网页Casperjs上的所有链接 - Click on all the links on a webpage Casperjs
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM