简体   繁体   中英

JsonParseException: Invalid UTF-8 middle byte 0x2d

I have a JSON String from which I am making a InputStream object as shown below and then I am making a GenericRecord object as I am trying to serialize my JSON object to Avro schema.

InputStream input = new ByteArrayInputStream(jsonString.getBytes());
DataInputStream din = new DataInputStream(input);

Decoder decoder = DecoderFactory.get().jsonDecoder(schema, din);

DatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
// below line is throwing exception
GenericRecord datum = reader.read(null, decoder);   

Below is the exception I am getting:

org.codehaus.jackson.JsonParseException: Invalid UTF-8 middle byte 0x2d at [Source: java.io.DataInputStream@562aee31; line: 1, column: 74]

And here is the actual JSON string on which this exception is happening:

{"name":"car_test","attr_value":"2006|Renault|Megane II Coupé-Cabriolet|null|null|null|null|0|Wed Feb 03 10:00:59 GMT-07:00 2016|1|77|null|null|null|null","data_id":900}

I did some research and found out that I need to use ByteArrayInputStream with UTF-8 encodings as shown below:

InputStream input = new ByteArrayInputStream(jsonString.getBytes(StandardCharsets.UTF_8.displayName()));

But my question is what is the reason of this exception? And why it is happening on my above JSON String? I am just trying to understand why this exception is happening on my above JSON String. And using UTF-8 is the right fix for this?

What does this error means Invalid UTF-8 middle byte 0x2d ?

You start with a Java Unicode String jsonString .

You then convert it into a byte stream using String.getBytes() . Since you didn't specify the byte encoding the platform default is used which is most likely ISO 8859-1.

Now you parse the JSON from the (Data)InputStream . Now Avro seems to use UTF-8 to decode the bytes. And when it encounters the é (0x2d) it fails since it is not a valid UTF byte sequence.

So in the end it is a mismatch between the actual encoding (ISO 8859-1) and the expected encoding (UTF-8).

You can solve this like you did, or just avoid to go from string to bytes:

Decoder decoder = DecoderFactory.get().jsonDecoder(schema, jsonString);

Instead of using low-level Avro decoder functionality, you might be interested in a more convenient approach with Jackson Avro module:

https://github.com/FasterXML/jackson-dataformat-avro/issues

in which you can provide Avro Schema, but still work with POJOs. You can also use regular JSON Jackson databinding (different ObjectMapper ) for JSON part, and read JSON into POJO, write POJO as Avro (or other combinations).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM