使用反射将pojo写入镶木地板文件

Question

HI Looking for APIs to write parquest with Pojos that I have. HI正在寻找用我写的Pojos写parquest的API。 I was able to generate avro schema using reflection and then create parquet schema using AvroSchemaConverter. 我能够使用反射生成avro模式，然后使用AvroSchemaConverter创建镶木地板模式。 Also i am not able to find a way to convert Pojos to GenericRecords (avro) else I could have been able to use AvroParquetWriter to write out the Pojos into parquet files. 此外，我无法找到将Pojos转换为GenericRecords（avro）的方法，否则我本可以使用AvroParquetWriter将Pojos写入镶木地板文件。 Any suggestions ? 有什么建议么？

Answer 1

If you want to go through avro you have two options: 如果你想通过avro，你有两个选择：

1) Let avro generate your pojos (see the tutorial here ). 1）让avro生成你的pojos（参见这里的教程）。 The generated pojos extend SpecificRecord which can then be used with AvroParquetWriter. 生成的pojos扩展了SpecificRecord，然后可以与AvroParquetWriter一起使用。

2) Write the conversion from your pojo to GenericRecord yourself. 2）自己编写从pojo到GenericRecord的转换。 You can do this either manually or a more generic solution would be to use reflection. 您可以手动执行此操作，也可以使用更通用的解决方案。 However, I encountered difficulties with this approach when I tried to read the data. 但是，当我尝试读取数据时，我遇到了这种方法的困难。 Based on the supplied schema avro found the pojo in the classpath and tried to instantiate a SpecificRecord instead of GenericRecord. 基于提供的模式，avro在类路径中找到了pojo，并尝试实例化一个SpecificRecord而不是GenericRecord。 Because of this reason I went with option 1. 因为这个原因我选择了1。

Parquet also supports now writing pojo directly. Parquet也支持现在直接写pojo。 Here is the pull request on parquet github page. 这是在镶木地板github页面上的拉取请求。 However, I think this is not part of an official release yet. 但是，我认为这还不是官方发布的一部分。 In another words, I did not find this code in maven. 换句话说，我没有在maven中找到这个代码。

Answer 2

DISCLAIMER: The following code was written when I was in a hurry. 免责声明：以下代码是在我赶时间写的。 It is not efficient and future versions of parquet will surely fix this more directly. 它效率不高，未来版本的镶木地板肯定会更直接地解决这个问题。 That being said, this is a lightweight inefficient approach to what you need. 话虽如此，这是一种轻量级的低效方法，可满足您的需求。 The strategy is POJO -> AVRO -> PARQUET 策略是POJO - > AVRO - > PARQUET

POJO -> AVRO: Declare a schema via reflection. POJO - > AVRO：通过反射声明一个模式。 Declare writers and readers based on the schema. 根据模式声明编写者和读者。 At the time of conversion write the object to byte stream and read it back as avro. 在转换时将对象写入字节流并将其作为avro读回。
AVRO -> Parquet: use the AvroParquetWriter included in the parquet-me project. AVRO - >实木复合地板：使用镶木地板项目中包含的AvroParquetWriter。

private static final Schema avroSchema = ReflectData.AllowNull.get().getSchema(YOURCLASS.class);
private static final ReflectDatumWriter<YOURCLASS> reflectDatumWriter = new ReflectDatumWriter<>(avroSchema);
private static final GenericDatumReader<Object> genericRecordReader = new GenericDatumReader<>(avroSchema);

public GenericRecord toAvroGenericRecord() throws IOException {
    ByteArrayOutputStream bytes = new ByteArrayOutputStream();
    reflectDatumWriter.write(this, EncoderFactory.get().directBinaryEncoder(bytes, null));
    return (GenericRecord) genericRecordReader.read(null, DecoderFactory.get().binaryDecoder(bytes.toByteArray(), null));
}

One more thing: it seems the parquet writers are currently very strict about null fields. 还有一件事：似乎镶木地板作家目前对空领域非常严格。 Make sure none of your fields are null before attempting to write to parquet 在尝试写入镶木地板之前，请确保没有任何字段为空

Answer 3

I wasn't able to find an existing solution, so I implemented it myself. 我无法找到现有的解决方案，所以我自己实现了。 Here is the link to the implementation: https://gist.github.com/alexeygrigorev/eab72e40c6051e0163a6693054906d66 以下是实施的链接： https ： //gist.github.com/alexeygrigorev/eab72e40c6051e0163a6693054906d66

In short, it does the following: 简而言之，它执行以下操作：

uses reflection to get Avro schema from the pojo 使用反射从pojo中获取Avro架构
using the schema and reflection it converts pojos to GenericRecord objects 使用模式和反射，它将pojos转换为GenericRecord对象
reflection is applied recursively if the pojo contains other pojos or list of pojos 如果pojo包含其他pojos或pojos列表，则递归应用反射

使用反射将pojo写入镶木地板文件

问题描述

3 个解决方案

解决方案1
1 已采纳 2014-10-24 21:39:25

解决方案2
0 2015-03-27 04:13:22

解决方案3
0 2018-01-17 13:49:27

使用反射将pojo写入镶木地板文件

问题描述

3 个解决方案

解决方案1 1 已采纳 2014-10-24 21:39:25

解决方案2 0 2015-03-27 04:13:22

解决方案3 0 2018-01-17 13:49:27

解决方案1
1 已采纳 2014-10-24 21:39:25

解决方案2
0 2015-03-27 04:13:22

解决方案3
0 2018-01-17 13:49:27