[英]How to write TIMESTAMP logical type (INT96) to parquet, using ParquetWriter?
[英]How can I write NULL value to parquet using org.apache.parquet.hadoop.ParquetWriter?
我有一個工具,它使用org.apache.parquet.hadoop.ParquetWriter將CSV數據文件轉換為鑲木地板數據文件。
我可以很好地編寫基本的原始類型(INT32,DOUBLE,BINARY字符串)。
我需要寫NULL值,但我不知道如何。 我試過用ParquetWriter寫一個null
,然后拋出異常。
如何使用org.apache.parquet.hadoop.ParquetWriter寫入NULL? 有可空的類型嗎?
我相信的代碼是自我解釋的:
ArrayList<Type> fields = new ArrayList<>();
fields.add(new PrimitiveType(Type.Repetition.OPTIONAL, PrimitiveTypeName.INT32, "int32_col", null));
fields.add(new PrimitiveType(Type.Repetition.OPTIONAL, PrimitiveTypeName.DOUBLE, "double_col", null));
fields.add(new PrimitiveType(Type.Repetition.OPTIONAL, PrimitiveTypeName.BINARY, "string_col", null));
MessageType schema = new MessageType("input", fields);
Configuration configuration = new Configuration();
configuration.setQuietMode(true);
GroupWriteSupport.setSchema(schema, configuration);
SimpleGroupFactory f = new SimpleGroupFactory(schema);
ParquetWriter<Group> writer = new ParquetWriter<Group>(
new Path("output.parquet"),
new GroupWriteSupport(),
CompressionCodecName.SNAPPY,
ParquetWriter.DEFAULT_BLOCK_SIZE,
ParquetWriter.DEFAULT_PAGE_SIZE,
1048576,
true,
false,
ParquetProperties.WriterVersion.PARQUET_1_0,
configuration
);
// create row 1 with defined values
Group group1 = f.newGroup();
Integer int1 = 100;
Double double1 = 0.5;
String string1 = "string-value";
group1.add(0, int1);
group1.add(1, double1);
group1.add(2, string1);
writer.write(group1);
// create row 2 with NULL values -- does not work!
Group group2 = f.newGroup();
Integer int2 = null;
Double double2 = null;
String string2 = null;
group2.add(0, int2); // <-- throws NullPointerException
group2.add(1, double2); // <-- throws NullPointerException
group2.add(2, string2); // <-- throws NullPointerException
writer.write(group2);
writer.close();
解決方案結果非常簡單,只是不寫值:
// create row 1 with defined values
Group group1 = f.newGroup();
Integer int1 = 100;
Double double1 = 0.5;
String string1 = "string-value";
group1.add(0, int1);
group1.add(1, double1);
group1.add(2, string1);
writer.write(group1);
// create row 2 with NULL values -- does not work!
Group group2 = f.newGroup();
// do nothing !
writer.write(group2);
// Now, parquet file will have 2 rows, one with values, one with null values
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.