简体   繁体   English

如何将大数据流转换为gzip压缩的base64字符串

[英]How to convert a large stream to a gzipped base64 string

I'm building an analytics platform and I want to compress my ETL(Extract Transform Load) jobs before I store them in my database. 我正在构建一个分析平台,我想先压缩ETL(提取转换负载)作业,然后再将其存储在数据库中。 Before I start writing the code, I was wondering if someone with some experience could tell me how to do it properly. 在开始编写代码之前,我想知道是否有一些经验的人可以告诉我如何正确地做。 I want to gzip the data, and then convert it to a base64 string. 我想gzip数据,然后将其转换为base64字符串。 Do I simply gzip, and convert to base64 or will that not work? 我是简单地gzip转换为base64还是不能正常工作?

This is the process I'm currently using for these large datasets. 这是我目前用于这些大型数据集的过程。

var streamObj = athenaClient.execute('my query').toStream()
var data = [];

redis.set('Some Dashboard Data', '[')

streamObj.on('data', function(record) {
    // TODO gzip record then convert to base64
    if (data.length === 500) {
        let tempData = JSON.stringify(data);
        data = []
        redis.append('Some Dashboard Data', tempData.slice(1, tempData.length - 1) + ',')
        }
        data.push(record);
    })
}

If this is not possible, is there a way to store the gzipped string instead? 如果这不可能,有没有办法存储压缩后的字符串?

Let node.js environment control memory by using backpressure provided by streams. 让node.js环境通过使用流提供的背压来控制内存。

I would consider this solution: 我会考虑这种解决方案:

inputStream
    .pipe(zlib)
    .pipe(transformToBase64Stream)
    .pipe(redisCli);

zlib is native so that should not cause any problems. zlib是本机的,因此不会引起任何问题。 To convert to base64 you can write transform stream or use external tools . 要转换为base64,可以编写转换流或使用外部工具 To pipe results into redis by stream , you could spawn child process redis-cli in pipe mode. 要将结果按流传输redis ,可以在管道模式下生成子进程redis-cli As mentioned in mass insertion and redis cli articles it is suggested for big data but you got to handle redis protocol yourself. 如大量插入和redis cli文章中所述,建议将其用于大数据,但您必须自己处理redis协议。 Read provided articles and let me know if it helped your problem to solve. 阅读提供的文章,让我知道它是否有助于解决您的问题。

Just to further elaborate on Zilvinas answer. 只是为了进一步阐述Zilvinas的答案。 I will show you all how I got it to work. 我将向大家展示我是如何工作的。

const athena = require('./athena')
const redis = require('./redis')
const zlib = require('zlib')
const Stream = require('stream')

exports.persistStream = (config, query, name, transform) => {
return new Promise((resolve, reject) => {
    let recordCount = 0

    var transformStream = new Stream.Transform({ writableObjectMode: true, readableObjectMode: true})
    transformStream._transform = function (chunk, encoding, done) {

        recordCount++

        if (transform) chunk = transform(chunk)

        let jsonChunk = JSON.stringify([chunk])

        switch (true) {
            case recordCount === 1: 
                jsonChunk = jsonChunk.slice(0, jsonChunk.length - 1); break
            default:
                jsonChunk = ',' + jsonChunk.slice(1, jsonChunk.length - 1); break
        }
        this.push(jsonChunk)
        done();
    };

    transformStream._final = function (done) {
        this.push(']')
        done()
    }

    const gzip = zlib.createGzip()

    let buffers = []

    var stream = athena.execute(query)
        .toStream()
        .pipe(transformStream)
        .pipe(gzip)

    gzip.on('data', (chunk) => {
        buffers.push(chunk)
    })

    gzip.on('end', function () {
        let buffer = Buffer.concat(buffers)
        redis.set(name, buffer.toString('base64'), (err, response) => {
            zlib.gzip(config, (err, buff) => {
                redis.set(name + ' Config', buff.toString('base64'), (err, response) => {
                    if (err) {
                        console.log(err)
                        reject()
                    } else {

                        console.log(name + ' succeeded')
                        resolve()
                    }
                })
            })
        })
    })

    stream.on('error', (err) => {
        console.log(err)
        reject()
    })
})
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM