繁体   English   中英

使用AWS Java Lambda将Parquet文件写入S3

[英]Writing parquet files to S3 using AWS java lamda

我正在编写从Lambda读取Kinesis的protobuf对象的AWS Lambda,并希望将它们作为木地板文件写入s3。

我看到有一个用于protobuf的ParquetWriter实现,称为ProtoParquetWriter,很好。 我的问题是ProtoParquetWriter期望在其构造函数中使用Path。

假设我根本不使用文件系统,那么在不将内容另存为拼花文件的情况下正确的做法是什么?

如果要写入S3,可以将路径设置为Path("s3a://<bucketName>/<s3Key>") 并且不要忘记在配置中设置S3凭据:

    conf.set("fs.s3a.access.key", "<s3AccessKey");
    conf.set("fs.s3a.secret.key", "<s3SecretKey");

假设您有一个列表(可以是任何复杂的对象),请使用示例代码读取/写入protobuf S3实木复合地板

    public class SimpleS3ParquetUtilities implements S3Utilities {

    final Logger logger;
    String PATH_SCHEMA = "s3a";
    CompressionCodecName compressionCodecName;

    public SimpleS3ParquetUtilities(Logger logger) {
        this.logger = logger;
        this.compressionCodecName = CompressionCodecName.UNCOMPRESSED;
    }

    public SimpleS3ParquetUtilities(Logger logger, CompressionCodecName compressionCodecName) {
        this.logger = logger;
        this.compressionCodecName = compressionCodecName;
    }

    @Override
    public String writeTransactions(String bucket, String objectKey, List<Transaction> transactions)
            throws Exception {
        if (objectKey.charAt(0) != '/')
            objectKey = "/" + objectKey;
        Path file = new Path(PATH_SCHEMA, bucket, objectKey);
        Stopwatch sw = Stopwatch.createStarted();
        // convert the list into protobuf 
        List<TransactionProtos.Transaction> protoTransactions = Convertor.toProtoBuf(transactions);
        try (ProtoParquetWriter<TransactionProtos.Transaction> writer = new ProtoParquetWriter<TransactionProtos.Transaction>(
                file, TransactionProtos.Transaction.class, this.compressionCodecName,
                ProtoParquetWriter.DEFAULT_BLOCK_SIZE, ProtoParquetWriter.DEFAULT_PAGE_SIZE)) {

            for (TransactionProtos.Transaction transaction : protoTransactions) {
                writer.write(transaction);
            }
        }
        logger.info("Parquet write elapse:[{}{}] Time:{}ms items:{}", bucket, objectKey,
                sw.elapsed(TimeUnit.MILLISECONDS), transactions.size());
        return "";
    }

    @Override
    public List<Transaction> readTransactions(String bucket, String pathWithFileName)
            throws Exception {
        if (pathWithFileName.charAt(0) != '/')
            pathWithFileName = "/" + pathWithFileName;
        Path file = new Path(PATH_SCHEMA, bucket, pathWithFileName);
        Stopwatch sw = Stopwatch.createStarted();
        try (ParquetReader<TransactionProtos.Transaction.Builder> reader = ProtoParquetReader.<TransactionProtos.Transaction.Builder>builder(
                file).build()) {
            List<TransactionProtos.Transaction> transactions = new ArrayList<TransactionProtos.Transaction>();
            TransactionProtos.Transaction.Builder builder = reader.read();
            while (builder != null) {
                TransactionProtos.Transaction transaction = builder.build();
                transactions.add(transaction);
                builder = reader.read();
            }
            logger.info("Parquet read elapsed:[{}{}] Time:{}ms items:{}", bucket, pathWithFileName,
                    sw.elapsed(TimeUnit.MILLISECONDS), transactions.size());
            return Convertor.fromProtoBuf(transactions);
        }
    }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM