简体   繁体   English

Node.js:wget,解压缩并转换为js而不写入文件

[英]Nodejs: wget, unzip and convert to js without writing to file

Well the title says it all, I'm trying to write a script (that runs in a nodejs/express server-side application) that leverages libraries request , unzip and xml2js to perform a task consisting of fetching a zip file from a given url, whose content is an xml file which I need to parse to a javascript object for some further processing. 标题说明了一切,我正在尝试编写一个脚本(在nodejs / express服务器端应用程序中运行),该脚本利用库requestunzipxml2js来执行任务,该任务包括从给定的url中获取zip文件。 ,其内容是xml文件,我需要将其解析为javascript对象以进行进一步处理。

So far I've managed to come up with: 到目前为止,我已经设法提出了:

var express = require("express");
var app = express();
/* some init code omitted */
var request = require("request");
var unzip = require("unzip");
var xml2js = require("xml2js");
var parser = new xml2js.Parser();

app.get("/import", function(req, res) {
    request("http://path.to/file.zip")
        .pipe(unzip.Parse())
        .on("entry", function(entry) {
            //This is what I'm trying to avoid, which doesn't even work
            entry.pipe(fs.createWriteStream(entry.path));
            fs.readFile(entry.path, function(err, data) {
                if(err) {
                    return res.status(500).send(err);
                }
                parser.parseString(data, function(err, obj) {
                    console.log(util.inspect(obj));
                    /* further processing of obj */
                });
            });
        });
});

Albeit the fact the contents of the xml file are correctly written to disk, I'm looking for an alternative to this approach for two reasons: 尽管xml文件的内容已正确写入磁盘,但出于以下两个原因,我正在寻找这种方法的替代方法:

  1. to save disk space, since I don't really need to keep the xml file anyway once it has been converted to js 节省磁盘空间,因为将XML文件转换为js后,我真的不需要保留该XML文件
  2. it doesn't even work: fs.readFile probably starts reading the file before fs.createWriteStream is done writing it, because the line console.log(utils.inspect(obj)) logs null (whereas if I run only the innermost fs.readFile block and replace entry.path with the name of the previously written file, it produces the desired output) 它甚至不起作用: fs.readFile可能在fs.createWriteStream完成写入之前开始读取文件,因为console.log(utils.inspect(obj))记录的是null (而如果我仅运行最里面的fs.readFile块,并将entry.path替换为先前写入的文件的名称,它将产生所需的输出)

I wish I could jot down a jsFiddle for this but I'm clueless as to how, when it comes to expressjs applications. 我希望我可以为此写下一个jsFiddle,但对于expressjs应用程序,我一无所知。 Cheers. 干杯。

EDITED 已编辑

Piping is unnecessary, parse data directly from the entry stream: 不需要管道,直接从条目流中解析数据:

app.get("/import", function(req, res) {
request("http://link-top.zip")
    .pipe(unzip.Parse())
    .on("entry", function(entry) {
        var chunks = [];
        var res;
        if(entry.path == 'needed.xml') {
            entry.on('data', function(data) {
                chunks.push(data.toString());
            });
            entry.on('end', function () {
                res = chunks.join("");
                parser.parseString(res, function(err, obj) {
                    console.log(util.inspect(obj));
                    /* further processing of obj */
                });
            });
        }
    });
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM