简体   繁体   English

如何使用cheerio和节点中的请求写入文件?

[英]How do I write to files using cheerio and request in node?

I am using cheerio and request in node to write a simple scraper, much like this link: 我正在使用cheerio并在节点中请求编写一个简单的scraper,就像这个链接:

https://www.digitalocean.com/community/tutorials/how-to-use-node-js-request-and-cheerio-to-set-up-simple-web-scraping https://www.digitalocean.com/community/tutorials/how-to-use-node-js-request-and-cheerio-to-set-up-simple-web-scraping

my file differs slightly, with the URLs to be scraped in this form: http://foo.com/2014/12/ , http://foo.com/2014/11/ etc. Here it is: 我的文件稍有不同,要以这种形式删除网址: http//foo.com/2014/12/,http//foo.com/2014/11/等。这是:

var fs = require('fs');
var request = require('request');
var cheerio = require('cheerio');


var year = ["2014", "2013", "2012", "2011", "2010"];
var yearLength = year.length;

var month = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12'];
var monthLength = month.length;

var postDate = '';
var singlePost = {};

var posts = {
    colors: {
        tastes: []
    }
};

var j, k = 0;

for (j=0; j < yearLength; j++) {
    for (var k=0; k < monthLength; k++) {    

    var fooUrl = 'http://foo.com/' + year[j] + '/' + month[k];         
    var localUrl = './' + year[j] + '/' + month[k] + '/texts.json';

    console.log(JSON.stringify(localUrl));

    request(fooUrl, function (error, response, html) {
        if (!error && response.statusCode == 200) {
            var $ = cheerio.load(html);

            $('ul.slashdot').each(function(i, element){
               var postDate = $(this).children().first().text();            
               $(this).children().first().siblings().each(function(i, element){

               var post = $(this).children();

               var postUrl = post.eq(3).attr('href');



            var singlePost = {

                day: postDate,
                url: postUrl,
                year: year[j],
                month: month[k]

            };

            posts.colors.tastes.push(singlePost);

            });


        });

        fs.writeFile(localUrl, JSON.stringify(posts, null, 2));
        }
    });
    }
}

now when i iterate over the two loops i am unable to write to the file. 现在,当我遍历两个循环时,我无法写入文件。 using writeFileSync doesn't help either. 使用writeFileSync也无济于事。

What am I doing wrong? 我究竟做错了什么? When I log to console it shows up fine as pretty JSON but doesnt actually write to the file. 当我登录到控制台时,它显示为漂亮的JSON,但实际上并没有写入文件。

Further, year[j] and month[k] are undefined. 此外, year[j]month[k]未定义。 How do I get them to see the scope of those two variables? 如何让他们看到这两个变量的范围?

You need to ensure that all directories on the path exists and create them if not. 您需要确保路径上的所有目录都存在,如果不存在则创建它们。

You may want to take a look at the node-fs-extra module. 您可能需要查看node-fs-extra模块。 createFile(file, callback) should help: https://www.npmjs.com/package/node-fs-extra createFile(文件,回调)应该有所帮助: https//www.npmjs.com/package/node-fs-extra

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM