简体   繁体   English

循环遍历可点击元素列表并将html写出到各自的文件

[英]Loop through list of clickable elements and write out the html to respective files

I'm using jQuery to get a list of elements that contain certain key words. 我正在使用jQuery来获取包含某些关键字的元素列表。 I'm able to get the list of elements but I don't know how to loop through each element, click on its child element and download the newly loaded page. 我能够获取元素列表,但我不知道如何遍历每个元素,单击其子元素并下载新加载的页面。 Here's the casperjs code I have so far: 这是我到目前为止的casperjs代码:

var casper = require('casper').create({
    clientScripts: ["/var/www/html/project/public/js/jquery-3.3.1.min.js"]
});

var fs = require('fs');

casper.start('https://m.1xbet.co.ke/en/line/Football/', function () {
    var links = casper.evaluate(function () {
        $.expr[":"].contains = $.expr.createPseudo(function (arg) {
            return function (elem) {
                return $(elem).text().toUpperCase().indexOf(arg.toUpperCase()) >= 0;
            };
        });
        return $("#events-betting").find("li.events__item_head:contains(World cup)");
    });

    var date = new Date(), year = date.getFullYear(), month = date.getMonth() + 1, day = date.getDate();
    var folderName = year + '-' + month + '-' + day;

    // loop would go here to save each file
    var path = "destination/" + folderName + "/1xbet/worldcup-1";
    fs.write(path + ".html", this.getHTML(), "w");

});

casper.run();

I'd like to click on the individual items on the links object - they aren't anchor tags but rather they are clickable divs with inline javascript listening for a click. 我想点击链接对象上的各个项目 - 它们不是锚标签,而是可点击的div,内嵌javascript听取点击。

The goal is to click on the div that has certain text I'm interested in, then once clicked, I can either choose to scrape the HTML and save it in a file or get the current url; 目标是单击具有我感兴趣的某些文本的div,然后一旦单击,我可以选择刮取HTML并将其保存在文件中或获取当前URL; either will be fine for my purposes. 要么对我的目的都好。 Since there could be multiple divs with the desired text, I'd like for a way to loop through each and do perform the same operation. 由于可能有多个具有所需文本的div,我想要一种循环遍历每个并执行相同操作的方法。

This is an example of the page I'm interested in: 这是我感兴趣的页面的一个例子:

https://m.1xbet.co.ke/en/line/Football/ https://m.1xbet.co.ke/en/line/Football/

The parent element in this case is: #events-betting and nested is a list of li tags with clickable divs. 在这种情况下,父元素是:#events-betting和嵌套是具有可点击div的li标签列表。

I can either choose to scrape the HTML and save it in a file or get the current url 我可以选择刮取HTML并将其保存在文件中或获取当前的URL

Of course the solution is very specific to this exact site, but then again it is quite normal when doing web scraping. 当然,解决方案对于这个确切的站点非常具体,但是在进行网络抓取时再次是正常的。

casper.start('https://m.1xbet.co.ke/en/line/Football/', function () {

  var links = casper.evaluate(function () {

    $.expr[":"].contains = $.expr.createPseudo(function (arg) {
      return function (elem) {
        return $(elem).text().toUpperCase().indexOf(arg.toUpperCase()) >= 0;
      };
    });

    var links = [];
    // Better to scrpape .events__title as it contains data-href attribute
    $("#events-betting").find(".events__title:contains(World cup)").each(function (i, item) {
      var lastPartOfurl = item.getAttribute("data-href");
      lastPartOfurl = lastPartOfurl.split("/");
      links.push("https://m.1xbet.co.ke/en/line/Football/" + item.getAttribute("data-champ") + "-" + lastPartOfurl[1]+'/');
    })

    return links;
  });

  console.log(links);
});

The result: 结果:

https://m.1xbet.co.ke/en/line/Football/1536237-FIFA-World-Cup-2018/,https://m.1xbet.co.ke/en/line/Football/1204917-FIFA-World-Cup-2018-Winner/,https://m.1xbet.co.ke/en/line/Football/1518431-FIFA-World-Cup-2018-Special-bets/,https://m.1xbet.co.ke/en/line/Football/1706515-FIFA-World-Cup-2018-Teams-Statistics-Group-Stage/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM