![](/img/trans.png)
[英]Copy files from one AWS account's S3 bucket to another AWS account's S3 bucket + using NodeJS
[英]streaming files from AWS S3 with NodeJS
我正在嘗試將大型 csv 文件中的 stream 數據放入 readline。 我嘗試將 readStream 從 s3 管道傳輸到 readline 輸入,但是我遇到了一個錯誤,S3 只允許連接保持打開一段時間。
我正在從 s3 創建 stream,如下所示:
import * as AWS from 'aws-sdk';
import {s3Env} from '../config';
export default async function createAWSStream(): Promise<SmartStream> {
return new Promise((resolve, reject) => {
const params = {
Bucket: s3Env.bucket,
Key: s3Env.key
};
try {
const s3 = new AWS.S3({
accessKeyId: s3Env.accessKey,
secretAccessKey: s3Env.secret
});
s3.headObject(bucketParams, (error, data) => {
if (error) {
throw error
};
const stream = s3.getObject(params).createReadStream();
resolve(stream);
})
} catch (error) {
reject(error);
}
})
}
然后我將它輸入readline:
import * as readline from 'readline';
import createAWSStream from './createAWSStream';
export const readCSVFile = async function(): Promise<void> {
const rStream = await createAWSStream();
const lineReader = readline.createInterface({
input: rStream
});
for await (const line of lineReader) {
// process line
}
}
我發現 s3 連接的超時設置為 120000 毫秒(2 分鍾)。 我嘗試簡單地提高超時,但是我遇到了來自 HTTPS 連接的更多超時問題。
如何以正確的方式從 AWS S3 獲取 stream 數據,而無需將一堆超時設置為某個非常大的時間范圍?
我能夠使用AWS S3 Range
屬性為此解決方案工作,並使用NodeJS Stream API 創建自定義可讀 ZF7B44CFFAFD5C52223D5498196C8A2E7BZ 。
通過使用這個“智能流”,我能夠在對 S3 實例的單獨請求中以塊的形式獲取數據。 通過分塊抓取數據,我避免了任何超時錯誤,並創建了更高效的 stream。 NodeJS 可讀超級 class 處理緩沖區,以免使 readline 的輸入過載。 它還自動處理 stream 的暫停和恢復。
這個 class 可以很容易地從 AWS S3 獲取 stream 大文件:
import {Readable, ReadableOptions} from "stream";
import type {S3} from "aws-sdk";
export class SmartStream extends Readable {
_currentCursorPosition = 0; // Holds the current starting position for our range queries
_s3DataRange = 2048 * 1024; // Amount of bytes to grab (I have jacked this up HD video files)
_maxContentLength: number; // Total number of bites in the file
_s3: S3; // AWS.S3 instance
_s3StreamParams: S3.GetObjectRequest; // Parameters passed into s3.getObject method
constructor(
parameters: S3.GetObjectRequest,
s3: S3,
maxLength: number,
// You can pass any ReadableStream options to the NodeJS Readable super class here
// For this example we wont use this, however I left it in to be more robust
nodeReadableStreamOptions?: ReadableOptions
) {
super(nodeReadableStreamOptions);
this._maxContentLength = maxLength;
this._s3 = s3;
this._s3StreamParams = parameters;
}
_read() {
if (this._currentCursorPosition > this._maxContentLength) {
// If the current position is greater than the amount of bytes in the file
// We push null into the buffer, NodeJS ReadableStream will see this as the end of file (EOF) and emit the 'end' event
this.push(null);
} else {
// Calculate the range of bytes we want to grab
const range = this._currentCursorPosition + this._s3DataRange;
// If the range is greater than the total number of bytes in the file
// We adjust the range to grab the remaining bytes of data
const adjustedRange =
range < this._maxContentLength ? range : this._maxContentLength;
// Set the Range property on our s3 stream parameters
this._s3StreamParams.Range = `bytes=${this._currentCursorPosition}-${adjustedRange}`;
// Update the current range beginning for the next go
this._currentCursorPosition = adjustedRange + 1;
// Grab the range of bytes from the file
this._s3.getObject(this._s3StreamParams, (error, data) => {
if (error) {
// If we encounter an error grabbing the bytes
// We destroy the stream, NodeJS ReadableStream will emit the 'error' event
this.destroy(error);
} else {
// We push the data into the stream buffer
this.push(data.Body);
}
});
}
}
}
要將其用於createAWSStream
function,我只需替換創建 readStream 的行:
const stream = s3.getObject(params).createReadStream();
而是創建我的SmartStream
class 的實例,傳入 s3 參數 object、s3 實例和數據的內容長度。
const stream = new SmartStream(params, s3, data.ContentLength);
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.