简体   繁体   中英

Deserializing objects in Avro with Map<String,Object> field returns values with wrong class

Trying to serialize objects that contain a Map instance in Apache Avro and the string keys of the Map are being deserialized but values are deserialized as class Object.

Able to use a GenericDatumWriter with a GenericData.Record instance with the properties copied into it but need to serialize the objects directly without having to copy the Map properties into a temporary object just to serialize it.

public void test1() {

    TimeDot dot = new TimeDot();
    dot.lat = 12;
    dot.lon = 34;
    dot.putProperty("id", 1234);
    dot.putProperty("s", "foo");
    System.out.println("BEFORE: " + dot);

    // serialize
    ReflectDatumWriter<TimeDot> reflectDatumWriter = new ReflectDatumWriter<>(TimeDot.class);
    Schema schema = ReflectData.get().getSchema(TimeDot.class);
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DataFileWriter<TimeDot> writer = new DataFileWriter<>(reflectDatumWriter).create(schema, out);
    writer.append(dot);
    writer.close();

    // deserialize
    ReflectDatumReader<TimeDot> reflectDatumReader = new ReflectDatumReader<>(TimeDot.class);
    ByteArrayInputStream inputStream = new ByteArrayInputStream(out.toByteArray());
    DataFileStream<TimeDot> reader = new DataFileStream<>(inputStream, reflectDatumReader);
    Object dot2 = reader.next();
    reader.close();
    System.out.println("AFTER: " + dot2);
}

public static class TimeDot {
    Map<String, Object> props = new LinkedHashMap<>();
    double lat;
    double lon;

    public void putProperty(String key, Object value) {
        props.put(key, value);
    }

    public String toString() {
        return "lat="+ lat +", lon="+ lon +", props="+props;
    }
}

Output:

 BEFORE: lat=12.0, lon=34.0, props={id=1234, s=foo}

 AFTER:  lat=12.0, lon=34.0, props={id=java.lang.Object@2b9627bc, s=java.lang.Object@65e2dbf3}

Next tried to manually create the Schema but that fails to serialize.

Exception in thread "main" java.lang.NullPointerException: in TimeDot in map in java.lang.Object null of java.lang.Object of map in field props of TimeDot

public void test2() throws IOException {        

    TimeDot dot = new TimeDot();
    dot.lat = 12;
    dot.lon = 34;
    dot.putProperty("id", 1234);
    dot.putProperty("s", "foo");
    System.out.println(dot);

    // create Schema
    List<Schema.Field> propFields = new ArrayList<>();
    propFields.add(new Schema.Field("id", Schema.create(Schema.Type.INT)));
    propFields.add(new Schema.Field("s", Schema.create(Schema.Type.STRING)));
    Schema propRecSchema = Schema.createRecord("Object",null,"java.lang",false,propFields);
    Schema propSchema = Schema.createMap(propRecSchema);
    List<Schema.Field> fields = new ArrayList<>(3);
    fields.add(new Schema.Field("lat", Schema.create(Schema.Type.DOUBLE)));
    fields.add(new Schema.Field("lon", Schema.create(Schema.Type.DOUBLE)));
    fields.add(new Schema.Field("props", propSchema));
    Schema schema = Schema.createRecord("TimeDot", null, "", false, fields);
    System.out.println("\nschema:\n" + schema);

    // serialize
    ReflectDatumWriter<TimeDot> reflectDatumWriter = new ReflectDatumWriter<>(TimeDot.class);
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DataFileWriter<TimeDot> writer = new DataFileWriter<>(reflectDatumWriter).create(schema, out);
    writer.append(dot); // *** fails here > NullPointerException ***
    writer.close();

    // deserialize
    ReflectDatumReader<TimeDot> reader = new ReflectDatumReader<>(schema);
    TimeDot dot2 = reader.read(null,
            DecoderFactory.get().binaryDecoder(out.toByteArray(), null));
    System.out.println(dot2);
}

I think the easiest way is to add an annotation

@org.apache.avro.reflect.AvroSchema("{\"type\": \"map\", \"values\": [\"string\", \"int\"]}")
Map<String, Object> props = new LinkedHashMap<>();

To serialize an object that contains a Map must define a Union in the Avro schema with the list of all possible types of values.

IMPORTANT: If do not set the namespace correctly then the deserialization returns a GenericData.Record rather than a TimeDot class instance.

    List<Schema.Field> fields = new ArrayList<>();
    fields.add(new Schema.Field("lat", Schema.create(Schema.Type.DOUBLE)));
    fields.add(new Schema.Field("lon", Schema.create(Schema.Type.DOUBLE)));
    fields.add(new Schema.Field("props", Schema.createMap(
            Schema.createUnion(Arrays.asList(
                Schema.create(Schema.Type.INT),
                Schema.create(Schema.Type.STRING))))));

    Schema schema = Schema.createRecord("TimeDot", null, "TestAvroUnion", false, fields);

    TimeDot dot = new TimeDot();
    dot.lat = 12;
    dot.lon = 34;
    dot.putProperty("id", 1234);
    dot.putProperty("s", "foo");
    System.out.println("BEFORE: " + dot);

    // serialize
    ReflectDatumWriter<TimeDot> reflectDatumWriter = new ReflectDatumWriter<>(schema);
    ByteArrayOutputStream out = new ByteArrayOutputStream();
    DataFileWriter<TimeDot> dataWriter = new DataFileWriter<>(reflectDatumWriter);
    dataWriter.create(schema, out);
    dataWriter.append(dot);
    dataWriter.close();

    // deserialize
    ReflectDatumReader<TimeDot> reflectDatumReader = new ReflectDatumReader<>(schema);
    try(
        ByteArrayInputStream bis = new ByteArrayInputStream(out.toByteArray());
        DataFileStream<TimeDot> reader = new DataFileStream<>(bis, reflectDatumReader)
    ) {
        TimeDot dot2 = reader.next();
        System.out.println("AFTER:  " + dot2);
    }
}

The output is as follows:

 BEFORE: lat=12.0, lon=34.0, props={id=1234, s=foo}
 AFTER:  lat=12.0, lon=34.0, props={id=1234, s=foo}

Alternatively use SchemaBuilder to create the schema:

 Schema schema = SchemaBuilder
            .record("TimeDot")
            .namespace("TestUnion")
            .fields()
            .name("lat")
                .type().doubleType()
                .noDefault()
            .name("lon")
                .type().doubleType()
                .noDefault()
            .name("props")
                .type().map()
                    .values(SchemaBuilder.unionOf().intType().and().stringType().endUnion())
                .noDefault()
            .endRecord();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM