[英]Writing parquet files to S3 using AWS java lamda
I'm writing AWS lambda that reads protobuf obejcts from Kinesis and would like to write them to s3 as parquet file. 我正在编写从Lambda读取Kinesis的protobuf对象的AWS Lambda,并希望将它们作为木地板文件写入s3。
I saw there's a implementation of ParquetWriter for protobuf called ProtoParquetWriter, which is good. 我看到有一个用于protobuf的ParquetWriter实现,称为ProtoParquetWriter,很好。 My problem is that ProtoParquetWriter expects a Path in its constructor. 我的问题是ProtoParquetWriter期望在其构造函数中使用Path。
What's the right way to do that without saving the content as parquet file, assuming I'm not using the file system at all? 假设我根本不使用文件系统,那么在不将内容另存为拼花文件的情况下正确的做法是什么?
If you want to write to S3, you can set the Path as Path("s3a://<bucketName>/<s3Key>")
. 如果要写入S3,可以将路径设置为Path("s3a://<bucketName>/<s3Key>")
。 And don't forget to set S3 credentials in the configurations: 并且不要忘记在配置中设置S3凭据:
conf.set("fs.s3a.access.key", "<s3AccessKey");
conf.set("fs.s3a.secret.key", "<s3SecretKey");
Assuming you have a List (can be any complex object), sample code to read/write protobuf S3 parquet 假设您有一个列表(可以是任何复杂的对象),请使用示例代码读取/写入protobuf S3实木复合地板
public class SimpleS3ParquetUtilities implements S3Utilities {
final Logger logger;
String PATH_SCHEMA = "s3a";
CompressionCodecName compressionCodecName;
public SimpleS3ParquetUtilities(Logger logger) {
this.logger = logger;
this.compressionCodecName = CompressionCodecName.UNCOMPRESSED;
}
public SimpleS3ParquetUtilities(Logger logger, CompressionCodecName compressionCodecName) {
this.logger = logger;
this.compressionCodecName = compressionCodecName;
}
@Override
public String writeTransactions(String bucket, String objectKey, List<Transaction> transactions)
throws Exception {
if (objectKey.charAt(0) != '/')
objectKey = "/" + objectKey;
Path file = new Path(PATH_SCHEMA, bucket, objectKey);
Stopwatch sw = Stopwatch.createStarted();
// convert the list into protobuf
List<TransactionProtos.Transaction> protoTransactions = Convertor.toProtoBuf(transactions);
try (ProtoParquetWriter<TransactionProtos.Transaction> writer = new ProtoParquetWriter<TransactionProtos.Transaction>(
file, TransactionProtos.Transaction.class, this.compressionCodecName,
ProtoParquetWriter.DEFAULT_BLOCK_SIZE, ProtoParquetWriter.DEFAULT_PAGE_SIZE)) {
for (TransactionProtos.Transaction transaction : protoTransactions) {
writer.write(transaction);
}
}
logger.info("Parquet write elapse:[{}{}] Time:{}ms items:{}", bucket, objectKey,
sw.elapsed(TimeUnit.MILLISECONDS), transactions.size());
return "";
}
@Override
public List<Transaction> readTransactions(String bucket, String pathWithFileName)
throws Exception {
if (pathWithFileName.charAt(0) != '/')
pathWithFileName = "/" + pathWithFileName;
Path file = new Path(PATH_SCHEMA, bucket, pathWithFileName);
Stopwatch sw = Stopwatch.createStarted();
try (ParquetReader<TransactionProtos.Transaction.Builder> reader = ProtoParquetReader.<TransactionProtos.Transaction.Builder>builder(
file).build()) {
List<TransactionProtos.Transaction> transactions = new ArrayList<TransactionProtos.Transaction>();
TransactionProtos.Transaction.Builder builder = reader.read();
while (builder != null) {
TransactionProtos.Transaction transaction = builder.build();
transactions.add(transaction);
builder = reader.read();
}
logger.info("Parquet read elapsed:[{}{}] Time:{}ms items:{}", bucket, pathWithFileName,
sw.elapsed(TimeUnit.MILLISECONDS), transactions.size());
return Convertor.fromProtoBuf(transactions);
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.