简体   繁体   中英

Avro Json.ObjectWriter - “Not the Json schema” error

I'm writing a tool to convert data from a homegrown format to Avro, JSON and Parquet, using Avro 1.8.0. Conversion to Avro and Parquet is working okay, but JSON conversion throws the following error:

Exception in thread "main" java.lang.RuntimeException: Not the Json schema:
{"type":"record","name":"Torperf","namespace":"converTor.torperf",
"fields":[{"name":"descriptor_type","type":"string"," 
[... rest of the schema omitted for brevity]

Irritatingly this is the schema that I passed along and which indeed I want the converter to use. I have no idea what Avro is complaining about. This is the relevant snippet of my code:

//  parse the schema file
Schema.Parser parser = new Schema.Parser();
Schema mySchema;
//  tried two ways to load the schema
//  like this
File schemaFile = new File("myJsonSchema.avsc");
mySchema = parser.parse(schemaFile) ;
//  and also like Json.class loads it's schema
mySchema = parser.parse(Json.class.getResourceAsStream("myJsonSchema.avsc"));

//  initialize the writer
Json.ObjectWriter jsonDatumWriter = new Json.ObjectWriter();
jsonDatumWriter.setSchema(mySchema);
OutputStream out = new FileOutputStream(new File("output.avro"));
Encoder encoder = EncoderFactory.get().jsonEncoder(mySchema, out);

//  append a record created by way of a specific mapping
jsonDatumWriter.write(specificRecord, encoder);

I replaced myJsonSchema.avsc with the one returned from the exception without success (and except whitespace and linefeeds they are the same). Initializing the jsonEncoder with org.apache.avro.data.Json.SCHEMA instead of mySchema didn't change anything either. Replacing the schema passed to Json.ObjectWriter with org.apache.avro.data.Json.SCHEMA leads to a NullPointerException at org.apache.avro.data.Json.write(Json.java:183) (which is a deprecated method).

From staring at org.apache.avro.data.Json.java it seems to me like Avro is checking my record schema against it's own schema of a Json record (line 58) for equality (line 73).

58  SCHEMA = Schema.parse(Json.class.getResourceAsStream("/org/apache/avro/data/Json.avsc"));

72  public void setSchema(Schema schema) {
73    if(!Json.SCHEMA.equals(schema))
74      throw new RuntimeException("Not the Json schema: " + schema);
75  }

The referenced Json.avsc defines the field types of a record:

{"type": "record", "name": "Json", "namespace":"org.apache.avro.data",
 "fields": [
     {"name": "value",
      "type": [
          "long",
          "double",
          "string",
          "boolean",
          "null",
          {"type": "array", "items": "Json"},
          {"type": "map", "values": "Json"}
      ]
     }
 ]
}

equals is implemented in org.apache.avro.Schema, line 346:

  public boolean equals(Object o) {
    if(o == this) {
      return true;
    } else if(!(o instanceof Schema)) {
      return false;
    } else {
      Schema that = (Schema)o;
      return this.type != that.type?false:this.equalCachedHash(that) && this.props.equals(that.props);
    }
  }

I don't fully understand what's going on in the third check (especially equalCachedHash()) but I only recognize checks for equality in a trivial way which doesn't make sense to me.

Also I can't find any examples or notes about usage of Avro's Json.ObjectWriter on the InterWebs. I wonder if I should go with the deprecated Json.Writer instead because there are at least a few code snippets online to learn and glean from.

The full source is available at https://github.com/tomlurge/converTor

Thanks,
Thomas

A little more debugging proofed that passing org.apache.avro.data.Json.SCHEMA to Json.ObjectWriter is indeed the right thing to do. The object I get back written to System.out prints the JSON object that I expect. The null pointer exception though did not go away. Probably I would not have had to setSchema() of Json.ObjectWriter at all since omitting the command alltogether leads to the same NullPointerException.

I finally filed a bug with Avro and it turned out that in my code I was handing an object of type "specific" to ObjectWriter which it couldn't handle. It did return silently though and an error was thrown only at a later stage. That was fixed in Avro 1.8.1 - see https://issues.apache.org/jira/browse/AVRO-1807 for details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM