简体   繁体   English

修改数据后的nodejs管道流

[英]nodejs piping stream after modifying data

I am learning about streaming with nodejs, I understand the examples shown in the request npm module ; 我正在学习使用nodejs进行流传输,我了解了request npm模块中显示的示例;

request(url).pipe(fs.createWriteStream('./filename.json'))

But there are two parts of my problem. 但是我的问题有两个部分。

Case 1: 情况1:

function fetchSitemaps() {
    return requestAsync(url).then(data => {
        const $ = cheerio.load(data);
        let urls = [];
        $("loc").each((i, e) => urls.push($(e).text()));
        fs.writeFileSync('./sitemaps.json', JSON.stringify(urls))
    })
}

I want to convert the above from writeFileSync to createWriteStream , but how do I keep appending data to an array which is in JSON format? 我想将上述内容从writeFileSync转换为createWriteStream ,但是如何继续将数据追加到JSON格式的数组中?

Case 2: 情况2:

function fetchLyricUrls() {
    let sitemaps = JSON.parse(fs.readFileSync('./sitemaps.json'));

    sitemaps.forEach((sitemap, i) => {
        let fileName = i + '.json';

        if(url_pat.exec(sitemap)) {
            fileName = url_pat.exec(sitemap)[1] + '.json';
        }

        requestAsync(url).then(data => {
            const $ = cheerio.load(data);
            let urls = [];
            $("loc").each((i, e) => urls.push($(e).text()));    
            return urls;
        }).then(urls => {
            let allUrls = [];
            urls.map(u => {
                return requestAsync(u).then(sm => {
                    const $ = cheerio.load(sm);
                    $("loc").each((i, e) => allUrls.push($(e).text()))
                    fs.writeFileSync('./lyrics.json', JSON.stringify(allUrls))
                    return allUrls;
                });
            });
        });
    });
}

The first part of the problem is same, appending to a json data using writeStream, but this time, I want to parse the the html data and get some text, which I want to send using stream, not the html data as a whole. 问题的第一部分是相同的,使用writeStream附加到json数据,但是这次,我想解析html数据并获取一些文本,这些文本我想使用流而不是整个html数据发送。

So let's split up the answers 因此,让我们分解答案

Case 1 情况1

First of all I'd try to keep the data as a stream and try not to accumulate it. 首先,我将尝试将数据保留为流,并尝试不进行累积。 So in essence, instead of loading the whole sitemap and then parsing it, I'd use something like the xml-nodes so that the nodes are a separate stream. 因此,从本质上讲,我将使用诸如xml-nodes之类的东西,而不是加载整个站点地图然后对其进行解析,从而使节点成为单独的流。 Then my module scramjet would come to transform 然后我的模块超燃冲压发动机将转变

    const request = require('request');
    const xmlNodes = require('xml-nodes');
    const writable = fs.createWritableStream('./sitemaps.json');
    const cheerio = require('cheerio');
    const scramjet = require('scramjet');

    writable.write('[');

    let first = 0;
    request('http://example.com/sitemap.xml')
            // this fetches your sitemap
        .on('end', () => writable.end("]"))
            // when the stream ends, this will end the sitemaps.json
        .pipe(xmlNodes('loc'))
            // this extracts your "loc" nodes
        .pipe(new scramjet.DataStream())
            // this creates a mappable stream
        .map((nodeString) => cheerio('loc', nodeString).text())
            // this extracts the text as in your question
        .map((url) => (first++ ? ',' : '') + JSON.stringify(url))
            // this makes sure that strings are nicely escaped
            // and prepends them with a comma on every node, but first one
        .pipe(writable, {end: false})
            // and this will push all your entries to the writable stream

Case 2 情况二

Here you'll need to do something similar, although if case 1 is an immediate step, then I'd suggest to store the files in lines of JSONs, not an array. 在这里,您需要做类似的事情,尽管如果情况1是立即采取的措施,那么我建议将文件存储在JSON行中,而不是数组中。 It'd make easier to stream that way. 这样可以使流式播放更加容易。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM