[英]How to write TIMESTAMP logical type (INT96) to parquet, using ParquetWriter?
[英]How can I write NULL value to parquet using org.apache.parquet.hadoop.ParquetWriter?
我有一个工具,它使用org.apache.parquet.hadoop.ParquetWriter将CSV数据文件转换为镶木地板数据文件。
我可以很好地编写基本的原始类型(INT32,DOUBLE,BINARY字符串)。
我需要写NULL值,但我不知道如何。 我试过用ParquetWriter写一个null
,然后抛出异常。
如何使用org.apache.parquet.hadoop.ParquetWriter写入NULL? 有可空的类型吗?
我相信的代码是自我解释的:
ArrayList<Type> fields = new ArrayList<>();
fields.add(new PrimitiveType(Type.Repetition.OPTIONAL, PrimitiveTypeName.INT32, "int32_col", null));
fields.add(new PrimitiveType(Type.Repetition.OPTIONAL, PrimitiveTypeName.DOUBLE, "double_col", null));
fields.add(new PrimitiveType(Type.Repetition.OPTIONAL, PrimitiveTypeName.BINARY, "string_col", null));
MessageType schema = new MessageType("input", fields);
Configuration configuration = new Configuration();
configuration.setQuietMode(true);
GroupWriteSupport.setSchema(schema, configuration);
SimpleGroupFactory f = new SimpleGroupFactory(schema);
ParquetWriter<Group> writer = new ParquetWriter<Group>(
new Path("output.parquet"),
new GroupWriteSupport(),
CompressionCodecName.SNAPPY,
ParquetWriter.DEFAULT_BLOCK_SIZE,
ParquetWriter.DEFAULT_PAGE_SIZE,
1048576,
true,
false,
ParquetProperties.WriterVersion.PARQUET_1_0,
configuration
);
// create row 1 with defined values
Group group1 = f.newGroup();
Integer int1 = 100;
Double double1 = 0.5;
String string1 = "string-value";
group1.add(0, int1);
group1.add(1, double1);
group1.add(2, string1);
writer.write(group1);
// create row 2 with NULL values -- does not work!
Group group2 = f.newGroup();
Integer int2 = null;
Double double2 = null;
String string2 = null;
group2.add(0, int2); // <-- throws NullPointerException
group2.add(1, double2); // <-- throws NullPointerException
group2.add(2, string2); // <-- throws NullPointerException
writer.write(group2);
writer.close();
解决方案结果非常简单,只是不写值:
// create row 1 with defined values
Group group1 = f.newGroup();
Integer int1 = 100;
Double double1 = 0.5;
String string1 = "string-value";
group1.add(0, int1);
group1.add(1, double1);
group1.add(2, string1);
writer.write(group1);
// create row 2 with NULL values -- does not work!
Group group2 = f.newGroup();
// do nothing !
writer.write(group2);
// Now, parquet file will have 2 rows, one with values, one with null values
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.