使用 NodeJS 從 AWS S3 流式傳輸文件

Question

我正在嘗試將大型 csv 文件中的 stream 數據放入 readline。 我嘗試將 readStream 從 s3 管道傳輸到 readline 輸入，但是我遇到了一個錯誤，S3 只允許連接保持打開一段時間。

我正在從 s3 創建 stream，如下所示：

import * as AWS from 'aws-sdk';
import {s3Env} from '../config';

export default async function createAWSStream(): Promise<SmartStream> {
    return new Promise((resolve, reject) => {
        const params = {
            Bucket: s3Env.bucket,
            Key: s3Env.key
        };

        try {
            const s3 = new AWS.S3({
                accessKeyId: s3Env.accessKey,
                secretAccessKey: s3Env.secret
            });

            s3.headObject(bucketParams, (error, data) => {
                if (error) {
                    throw error
                };

                const stream = s3.getObject(params).createReadStream();

                resolve(stream);
            })
        } catch (error) {
            reject(error);
        }
    })
}

然后我將它輸入readline：

import * as readline from 'readline';
import createAWSStream from './createAWSStream';

export const readCSVFile = async function(): Promise<void> {
  const rStream = await createAWSStream();

  const lineReader = readline.createInterface({
    input: rStream
  });

  for await (const line of lineReader) {
    // process line
  }
}

我發現 s3 連接的超時設置為 120000 毫秒（2 分鍾）。 我嘗試簡單地提高超時，但是我遇到了來自 HTTPS 連接的更多超時問題。

如何以正確的方式從 AWS S3 獲取 stream 數據，而無需將一堆超時設置為某個非常大的時間范圍？

Answer 1

我能夠使用AWS S3 Range屬性為此解決方案工作，並使用NodeJS Stream API 創建自定義可讀 ZF7B44CFFAFD5C52223D5498196C8A2E7BZ 。

通過使用這個“智能流”，我能夠在對 S3 實例的單獨請求中以塊的形式獲取數據。 通過分塊抓取數據，我避免了任何超時錯誤，並創建了更高效的 stream。 NodeJS 可讀超級 class 處理緩沖區，以免使 readline 的輸入過載。 它還自動處理 stream 的暫停和恢復。

這個 class 可以很容易地從 AWS S3 獲取 stream 大文件：

import {Readable, ReadableOptions} from "stream";
import type {S3} from "aws-sdk";

export class SmartStream extends Readable {
    _currentCursorPosition = 0; // Holds the current starting position for our range queries
    _s3DataRange = 2048 * 1024; // Amount of bytes to grab (I have jacked this up HD video files)
    _maxContentLength: number; // Total number of bites in the file
    _s3: S3; // AWS.S3 instance
    _s3StreamParams: S3.GetObjectRequest; // Parameters passed into s3.getObject method

    constructor(
        parameters: S3.GetObjectRequest,
        s3: S3,
        maxLength: number,
        // You can pass any ReadableStream options to the NodeJS Readable super class here
        // For this example we wont use this, however I left it in to be more robust
        nodeReadableStreamOptions?: ReadableOptions
    ) {
        super(nodeReadableStreamOptions);
        this._maxContentLength = maxLength;
        this._s3 = s3;
        this._s3StreamParams = parameters;
    }

    _read() {
        if (this._currentCursorPosition > this._maxContentLength) {
            // If the current position is greater than the amount of bytes in the file
            // We push null into the buffer, NodeJS ReadableStream will see this as the end of file (EOF) and emit the 'end' event
            this.push(null);
        } else {
            // Calculate the range of bytes we want to grab
            const range = this._currentCursorPosition + this._s3DataRange;
            // If the range is greater than the total number of bytes in the file
            // We adjust the range to grab the remaining bytes of data
            const adjustedRange =
                range < this._maxContentLength ? range : this._maxContentLength;
            // Set the Range property on our s3 stream parameters
            this._s3StreamParams.Range = `bytes=${this._currentCursorPosition}-${adjustedRange}`;
            // Update the current range beginning for the next go
            this._currentCursorPosition = adjustedRange + 1;
            // Grab the range of bytes from the file
            this._s3.getObject(this._s3StreamParams, (error, data) => {
                if (error) {
                    // If we encounter an error grabbing the bytes
                    // We destroy the stream, NodeJS ReadableStream will emit the 'error' event
                    this.destroy(error);
                } else {
                    // We push the data into the stream buffer
                    this.push(data.Body);
                }
            });
        }
    }
}

要將其用於createAWSStream function，我只需替換創建 readStream 的行：

const stream = s3.getObject(params).createReadStream();

而是創建我的SmartStream class 的實例，傳入 s3 參數 object、s3 實例和數據的內容長度。

const stream = new SmartStream(params, s3, data.ContentLength);

使用 NodeJS 從 AWS S3 流式傳輸文件

問題描述

1 個解決方案

解決方案1
1 已采納 2022-01-07 18:01:04

使用 NodeJS 從 AWS S3 流式傳輸文件

問題描述

1 個解決方案

解決方案1 1 已采納 2022-01-07 18:01:04

解決方案1
1 已采納 2022-01-07 18:01:04