简体   繁体   English

Node.js脚本静默失败?

[英]Node.js Script Failing Silently?

I've written a Node.js script that uses the download , axios , and fs modules to extract urls from JSON provided by the Federal Register , and download the associated PDF files. 我编写了一个Node.js脚本,该脚本使用downloadaxiosfs模块从联邦注册局提供的JSON中提取URL,然后下载相关的PDF文件。 However, the script routinely fails to download all of the PDFs. 但是,该脚本通常无法下载所有PDF。

For whatever reason, my script "stalls" before downloading all of the PDF files. 无论出于何种原因,我的脚本都会在下载所有PDF文件之前“停滞”。 Meaning, it starts off great (downloads maybe 70, 80 files) but then stalls. 意思是,它开始时很棒(下载了70、80个文件),但随后停滞了。 It doesn't fire my catch block, or fail in any way. 它不会触发我的捕获块,也不会以任何方式失败。 It just stops downloading. 它只是停止下载。

The number of files varies based on what wifi connection I'm on. 文件的数量根据我所使用的wifi连接而异。 However, I've never been able to get the code to finish, and fire the .then block in my code. 但是,我从未能够完成代码并在代码中触发.then块。 Ideally, I would like to use the .then block to process the files once they are downloaded. 理想情况下,下载文件后,我想使用.then块来处理文件。

Here is the code: 这是代码:

// The callback function that writes the file...
function writeFile(path, contents, cb){
  mkdirp(getDirName(path), function(err){
    if (err) return cb(err)
      fs.writeFile(path, contents, cb)
  })
};

// The function that gets the JSON...
axios.get(`http://federalregister.gov/api/v1/public-inspection-documents.json?conditions%5Bavailable_on%5D=${today}`)
  .then(downloadPDFS)
  .catch((err) => {
    console.log("COULD NOT DOWNLOAD FILES: \n", err);
  });

// The function that downloads the data and triggers my write callback...
function downloadPDFS(res) {
  const downloadPromises = res.data.results.map(item => (
    download(item.pdf_url)
      .then(data => new Promise((resolve, reject) => {
        writeFile(`${__dirname}/${today}/${item.pdf_file_name}`, data, (err) => {
          if(err) reject(err);
          else resolve(console.log("FILE WRITTEN: ", item.pdf_file_name));
        });
      }))
  ))
  return Promise.all(downloadPromises).then((res) => console.log("DONE"))
}

My project is on Github here , in case you'd like to install it and try for yourself. 我的项目在这里的 Github上,以防您想要安装并自己尝试。 Here's a summary of what's going on, in plain English: 以下是使用简单英语进行的摘要:

The script fetches JSON from a server, which contains the urls to all 126 PDFs. 该脚本从服务器获取JSON,该服务器包含所有126个PDF的URL。 It then passes an array of these urls to the synchronous map function. 然后,将这些网址的数组传递给同步map函数。 Each of the urls is transformed into a promise, with the download module. 每个URL都通过download模块转换为Promise。 That promise is implicitly returned, and stored in the Promise.all wrapper. 该承诺将隐式返回,并存储在Promise.all包装器中。 When the download promise resolves (the document is done downloading) my custom writeFile function will trigger, writing the PDF file with the downloaded data. 当下载承诺解决(文档已完成下载)后,我的自定义writeFile函数将触发,并使用下载的数据写入PDF文件。 When all of the files have downloaded, the Promise.all wrapper should resolve. 下载所有文件后, Promise.all包装程序应解析。 But that doesn't happen. 但这不会发生。

What is going wrong? 怎么了?

EDIT -- 编辑-

As you can see below, the script runs for a while, but then it just stalls and doesn't download any more files... 如下所示,该脚本运行了一段时间,但随后停顿了下来,不再下载其他文件...

在此处输入图片说明

If it really is a rate issue then there's a few ways you can solve it (depending on how the API is rate limited) 如果确实是一个速率问题,那么有几种方法可以解决它(取决于API如何限制速率)

Below there are 3 solutions in one 下面有3个解决方案

rateLimited ... this fires off requests limited to a given number of requests per second rateLimited ...这会触发每秒限制为给定请求数的请求

singleQueue ... one request at a time, no rate limit, just all requests in series singleQueue ...一次只有一个请求,没有速率限制,只有一系列请求

multiQueue ... at most a given number of requests "in flight" at a time multiQueue ...一次最多给定数量的请求“处于运行中”

const rateLimited = perSecond => {
    perSecond = isNaN(perSecond) || perSecond < 0.0001 ? 0.0001 : perSecond;
    const milliSeconds = Math.floor(1000 / perSecond);
    let promise = Promise.resolve(Date.now);
    const add = fn => promise.then(lastRun => {
        const wait = Math.max(0, milliSeconds + lastRun - Date.now);
        promise = promise.thenWait(wait).then(() => Date.now);
        return promise.then(fn);
    });
    return add;
};
const singleQueue = () => {
    let q = Promise.resolve();
    return fn => q = q.then(fn);
};
const multiQueue = length => {
    length = isNaN(length) || length < 1 ? 1 : length;
    const q = Array.from({ length }, () => Promise.resolve());
    let index = 0;
    const add = fn => {
        index = (index + 1) % length;
        return q[index] = q[index].then(fn);
    };
    return add;
};

// uncomment one, and only one, of the three "fixup" lines below
let fixup = rateLimited(10); // 10 per second for example
//let fixup = singleQueue;   // one at a time
//let fixup = multiQueue(6); // at most 6 at a time for example

const writeFile = (path, contents) => new Promise((resolve, reject) => {
    mkdirp(getDirName(path), err => {
        if (err) return reject(err);
        fs.writeFile(path, contents, err => {
            if (err) return reject(err);
            resolve();
        })
    })
});


axios.get(`http://federalregister.gov/api/v1/public-inspection-documents.json?conditions%5Bavailable_on%5D=${today}`)
    .then(downloadPDFS)
    .catch((err) => {
        console.log("COULD NOT DOWNLOAD FILES: \n", err);
    });

function downloadPDFS(res) {
    const downloadPromises = res.data.results.map(item => fixup(() => 
        download(item.pdf_url)
        .then(data => writeFile(`${__dirname}/${today}/${item.pdf_file_name}`, data))
        .then(() => console.log("FILE WRITTEN: ", item.pdf_file_name))
    ));
    return Promise.all(downloadPromises).then(() => console.log("DONE"));
}

I've also refactored the code a bit so downloadPDFS uses promises only - all the node-callback style code is put into writeFile 我也对代码进行了一些重构,因此downloadPDFS仅使用promise-所有节点回调样式代码都放入了writeFile

As Jaromanda pointed out, this is likely to do with the API limiting my access, not with an error in the script. 正如Jaromanda指出的那样,这很可能与API限制了我的访问权限有关,而不是与脚本错误有关。

I added a filter to the script, to select less data, and it works. 我向脚本添加了一个过滤器,以选择较少的数据,并且它可以正常工作。 As follows: 如下:

axios.get(`http://federalregister.gov/api/v1/public-inspection-documents.json?conditions%5Bavailable_on%5D=${today}`)
  .then(downloadPDFS)
  .then(() => {
    console.log("DONE")
  })
  .catch((err) => {
    console.log("COULD NOT DOWNLOAD FILES: \n", err);
  });

function downloadPDFS(res) {
  const EPA = res.data.results.filter((item) => {
    return item.agencies[0].raw_name === "ENVIRONMENTAL PROTECTION AGENCY"; //// THIS FILTER
  });

  const downloadPromises = EPA.map(item => ( //// ONLY DOWNLOADING SOME OF THE DATA
    download(item.pdf_url)
      .then(data => new Promise((resolve, reject) => {
        writeFile(`${__dirname}/${today}/${item.pdf_file_name}`, data, (err) => {
          if(err) reject(err);
          else resolve(console.log("FILE WRITTEN: ", item.pdf_file_name));
        });
      }))
  ))
  return Promise.all(downloadPromises)
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM