[英]nodejs piping stream after modifying data
I am learning about streaming with nodejs, I understand the examples shown in the request npm module ; 我正在学习使用nodejs进行流传输,我了解了request npm模块中显示的示例;
request(url).pipe(fs.createWriteStream('./filename.json'))
But there are two parts of my problem. 但是我的问题有两个部分。
Case 1: 情况1:
function fetchSitemaps() {
return requestAsync(url).then(data => {
const $ = cheerio.load(data);
let urls = [];
$("loc").each((i, e) => urls.push($(e).text()));
fs.writeFileSync('./sitemaps.json', JSON.stringify(urls))
})
}
I want to convert the above from writeFileSync
to createWriteStream
, but how do I keep appending data to an array which is in JSON format? 我想将上述内容从
writeFileSync
转换为createWriteStream
,但是如何继续将数据追加到JSON格式的数组中?
Case 2: 情况2:
function fetchLyricUrls() {
let sitemaps = JSON.parse(fs.readFileSync('./sitemaps.json'));
sitemaps.forEach((sitemap, i) => {
let fileName = i + '.json';
if(url_pat.exec(sitemap)) {
fileName = url_pat.exec(sitemap)[1] + '.json';
}
requestAsync(url).then(data => {
const $ = cheerio.load(data);
let urls = [];
$("loc").each((i, e) => urls.push($(e).text()));
return urls;
}).then(urls => {
let allUrls = [];
urls.map(u => {
return requestAsync(u).then(sm => {
const $ = cheerio.load(sm);
$("loc").each((i, e) => allUrls.push($(e).text()))
fs.writeFileSync('./lyrics.json', JSON.stringify(allUrls))
return allUrls;
});
});
});
});
}
The first part of the problem is same, appending to a json data using writeStream, but this time, I want to parse the the html data and get some text, which I want to send using stream, not the html data as a whole. 问题的第一部分是相同的,使用writeStream附加到json数据,但是这次,我想解析html数据并获取一些文本,这些文本我想使用流而不是整个html数据发送。
So let's split up the answers 因此,让我们分解答案
First of all I'd try to keep the data as a stream and try not to accumulate it. 首先,我将尝试将数据保留为流,并尝试不进行累积。 So in essence, instead of loading the whole sitemap and then parsing it, I'd use something like the xml-nodes so that the nodes are a separate stream.
因此,从本质上讲,我将使用诸如xml-nodes之类的东西,而不是加载整个站点地图然后对其进行解析,从而使节点成为单独的流。 Then my module scramjet would come to transform
然后我的模块超燃冲压发动机将转变
const request = require('request');
const xmlNodes = require('xml-nodes');
const writable = fs.createWritableStream('./sitemaps.json');
const cheerio = require('cheerio');
const scramjet = require('scramjet');
writable.write('[');
let first = 0;
request('http://example.com/sitemap.xml')
// this fetches your sitemap
.on('end', () => writable.end("]"))
// when the stream ends, this will end the sitemaps.json
.pipe(xmlNodes('loc'))
// this extracts your "loc" nodes
.pipe(new scramjet.DataStream())
// this creates a mappable stream
.map((nodeString) => cheerio('loc', nodeString).text())
// this extracts the text as in your question
.map((url) => (first++ ? ',' : '') + JSON.stringify(url))
// this makes sure that strings are nicely escaped
// and prepends them with a comma on every node, but first one
.pipe(writable, {end: false})
// and this will push all your entries to the writable stream
Here you'll need to do something similar, although if case 1 is an immediate step, then I'd suggest to store the files in lines of JSONs, not an array. 在这里,您需要做类似的事情,尽管如果情况1是立即采取的措施,那么我建议将文件存储在JSON行中,而不是数组中。 It'd make easier to stream that way.
这样可以使流式播放更加容易。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.