简体   繁体   中英

How to download files with node-fetch

I need help implementing a file downloader in nodejs.

So i need to download over 25'000 files from a server. Im using node-fetch but i don't exactly know how to implement this. I tried using Promise.allSettled() but i also need a way to limit the amount of concurrent requests to the server otherwise i get rate-limited.

This is my code so far:

const fetch = require('node-fetch')

async function main () {
  const urls = [
    'https://www.example.com/foo.png',
    'https://www.example.com/bar.gif',
    'https://www.example.com/baz.jpg',
    ... many more (~25k)
  ]

  // how to save each file on the machine with same file name and extension?
  // how to limit the amount of concurrent requests to the server?
  const files = await Promise.allSettled(
    urls.map((url) => fetch(url))
  )
}

main()

So my questions are:

  • How do i limit the amount of concurrent requests to the server? Can this be solved using a custom https agent with node-fetch and setting the maxSockets to something like 10?
  • How do i check if the file exists on the server and if it does then download it on my machine with the same file name and extension?

It would be very helpful if someone could show a small example code how i would implement such functionality.

Thanks in advance.

To control how many simultaneous requests are running at once, you can use any of these three options:

mapConcurrent() here and pMap() here : These let you iterate an array, sending requests to a host, but manages things so that you only ever have N requests in flight at the same time where you decide what the value of N is.

rateLimitMap() here : Let's you manage how many requests per second are sent.

Can this be solved using a custom https agent with node-fetch and setting the maxSockets to something like 10?

I'm not aware of any solution using a custom https agent.

How do i check if the file exists on the server and if it does then download it on my machine with the same file name and extension?

You can't directly access a remote http server's file system. So, all you can do is make an http request for a specific resource (a url) and examine the http response to see if it returned data or returned some sort of http error such as a 404.

As for filenames and extensions, that depends entirely upon whether you already know what to request and the server supports that being part of the URL or whether the server returns to you that information in an http header. If you're requesting specific filename and extension, then you can just create a file with that name and extension and save the http response data to that file on your local drive.

As for coding examples, the doc for node-fetch() shows examples of downloading data to a file using streams here: https://www.npmjs.com/package/node-fetch#streams .

import {createWriteStream} from 'fs';
import {pipeline} from 'stream';
import {promisify} from 'util'
import fetch from 'node-fetch';

const streamPipeline = promisify(pipeline);

const testUrl = 'https://github.githubassets.com/images/modules/logos_page/Octocat.png';
const response = await fetch(testUrl);

if (!response.ok) throw new Error(`unexpected response ${response.statusText}`);

await streamPipeline(response.body, createWriteStream('./octocat.png'));

Personally, I wouldn't use node-fetch as it's design center is to mimic the browser implementation of node which is not as friendly an API design as similar libraries built explicitly for nodejs. I use got() , and there are several other good libraries listed here . You can pick your favorite.

Here's a code example using the got() library:

import {promisify} from 'node:util';
import stream from 'node:stream';
import fs from 'node:fs';
import got from 'got';

const pipeline = promisify(stream.pipeline);

await pipeline(
    got.stream('https://sindresorhus.com'),
    fs.createWriteStream('index.html')
);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM