簡體   English   中英

使用 NodeJS 從 AWS S3 流式傳輸文件

[英]streaming files from AWS S3 with NodeJS

我正在嘗試將大型 csv 文件中的 stream 數據放入 readline。 我嘗試將 readStream 從 s3 管道傳輸到 readline 輸入,但是我遇到了一個錯誤,S3 只允許連接保持打開一段時間。

我正在從 s3 創建 stream,如下所示:

import * as AWS from 'aws-sdk';
import {s3Env} from '../config';

export default async function createAWSStream(): Promise<SmartStream> {
    return new Promise((resolve, reject) => {
        const params = {
            Bucket: s3Env.bucket,
            Key: s3Env.key
        };

        try {
            const s3 = new AWS.S3({
                accessKeyId: s3Env.accessKey,
                secretAccessKey: s3Env.secret
            });

            s3.headObject(bucketParams, (error, data) => {
                if (error) {
                    throw error
                };

                const stream = s3.getObject(params).createReadStream();

                resolve(stream);
            })
        } catch (error) {
            reject(error);
        }
    })
}

然后我將它輸入readline:

import * as readline from 'readline';
import createAWSStream from './createAWSStream';

export const readCSVFile = async function(): Promise<void> {
  const rStream = await createAWSStream();

  const lineReader = readline.createInterface({
    input: rStream
  });

  for await (const line of lineReader) {
    // process line
  }
}

我發現 s3 連接的超時設置為 120000 毫秒(2 分鍾)。 我嘗試簡單地提高超時,但是我遇到了來自 HTTPS 連接的更多超時問題。

如何以正確的方式從 AWS S3 獲取 stream 數據,而無需將一堆超時設置為某個非常大的時間范圍?

我能夠使用AWS S3 Range屬性為此解決方案工作,並使用NodeJS Stream API 創建自定義可讀 ZF7B44CFFAFD5C52223D5498196C8A2E7BZ

通過使用這個“智能流”,我能夠在對 S3 實例的單獨請求中以塊的形式獲取數據。 通過分塊抓取數據,我避免了任何超時錯誤,並創建了更高效的 stream。 NodeJS 可讀超級 class 處理緩沖區,以免使 readline 的輸入過載。 它還自動處理 stream 的暫停和恢復。

這個 class 可以很容易地從 AWS S3 獲取 stream 大文件:

import {Readable, ReadableOptions} from "stream";
import type {S3} from "aws-sdk";

export class SmartStream extends Readable {
    _currentCursorPosition = 0; // Holds the current starting position for our range queries
    _s3DataRange = 2048 * 1024; // Amount of bytes to grab (I have jacked this up HD video files)
    _maxContentLength: number; // Total number of bites in the file
    _s3: S3; // AWS.S3 instance
    _s3StreamParams: S3.GetObjectRequest; // Parameters passed into s3.getObject method

    constructor(
        parameters: S3.GetObjectRequest,
        s3: S3,
        maxLength: number,
        // You can pass any ReadableStream options to the NodeJS Readable super class here
        // For this example we wont use this, however I left it in to be more robust
        nodeReadableStreamOptions?: ReadableOptions
    ) {
        super(nodeReadableStreamOptions);
        this._maxContentLength = maxLength;
        this._s3 = s3;
        this._s3StreamParams = parameters;
    }

    _read() {
        if (this._currentCursorPosition > this._maxContentLength) {
            // If the current position is greater than the amount of bytes in the file
            // We push null into the buffer, NodeJS ReadableStream will see this as the end of file (EOF) and emit the 'end' event
            this.push(null);
        } else {
            // Calculate the range of bytes we want to grab
            const range = this._currentCursorPosition + this._s3DataRange;
            // If the range is greater than the total number of bytes in the file
            // We adjust the range to grab the remaining bytes of data
            const adjustedRange =
                range < this._maxContentLength ? range : this._maxContentLength;
            // Set the Range property on our s3 stream parameters
            this._s3StreamParams.Range = `bytes=${this._currentCursorPosition}-${adjustedRange}`;
            // Update the current range beginning for the next go
            this._currentCursorPosition = adjustedRange + 1;
            // Grab the range of bytes from the file
            this._s3.getObject(this._s3StreamParams, (error, data) => {
                if (error) {
                    // If we encounter an error grabbing the bytes
                    // We destroy the stream, NodeJS ReadableStream will emit the 'error' event
                    this.destroy(error);
                } else {
                    // We push the data into the stream buffer
                    this.push(data.Body);
                }
            });
        }
    }
}

要將其用於createAWSStream function,我只需替換創建 readStream 的行:

const stream = s3.getObject(params).createReadStream();

而是創建我的SmartStream class 的實例,傳入 s3 參數 object、s3 實例和數據的內容長度。

const stream = new SmartStream(params, s3, data.ContentLength);

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM