简体   繁体   中英

Avro Message - InvalidNumberEncodingException deserializing logicalType date

I have an exception when I deserialize a message with a field defined as logicalType date. As documentation, the field is defined as:

{"name": "startDate", "type": {"type": "int", "logicalType": "date"}}

I use "avro-maven-plugin" (1.9.2) to generate the java classes and I can set the field startDate to java.time.LocalDate.now() ; the avro object is serialize the message and send it to a kafka topic. So far, everything is good.

However, when I read the message I get the exception:

Caused by: org.apache.avro.InvalidNumberEncodingException: Invalid int encoding
    at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:166)
    at org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
    at org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:551)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:195)
    at org.apache.avro.generic.GenericDatumReader.readWithConversion(GenericDatumReader.java:173)
    at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:134)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
    at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)

What makes everything even more weird is that no error occurs if I set a different date like LocalDate.of(1970, 1, 1) .

In other words, if the serialized int value representing the number of day since 01/01/1970 is small enough, everything works fine. I tried that test after having a look of the code that raise the exception, it made me think that if the int day is lower that 127 the error could be avoided:

   public int readInt() throws IOException {
        this.ensureBounds(5);
        int len = 1;
        int b = this.buf[this.pos] & 255;
        int n = b & 127;
        if (b > 127) {
            b = this.buf[this.pos + len++] & 255;
            n ^= (b & 127) << 7;
            if (b > 127) {
                b = this.buf[this.pos + len++] & 255;
                n ^= (b & 127) << 14;
                if (b > 127) {
                    b = this.buf[this.pos + len++] & 255;
                    n ^= (b & 127) << 21;
                    if (b > 127) {
                        b = this.buf[this.pos + len++] & 255;
                        n ^= (b & 127) << 28;
                        if (b > 127) {
                            throw new InvalidNumberEncodingException("Invalid int encoding");
                        }
                    }
                }
            }
        }
....

Of course I can't use in production only date close to 01/01/1970. Any help is welcome:-)

TL:DR

The code that you have posted can deserialize numbers not only up to 127, but the full range of Java int , so up to a couple of billion corresponding to dates several million years after 1970.

Details

The BinaryDecoder.readInt method from Apache Avro deserializes from 1 through 5 bytes into a Java int . It uses the last 7 bits from each byte for the int , only not the sign bit. Instead the sign bit is used for determining how many bytes to read. A sign bit of 0 means this is the last byte. A sign bit of 1 means there are more bytes after this one. The exception is thrown in case 5 bytes are read and they all had their sign bits set to 1. 5 bytes can supply 35 bits, and an int can hold 32 bits, so regarding more than 5 bytes as an error is fair.

So from the code that you have posted no dates that I would reasonably expect to use in an application will pose any problems.

Test code

I put your method in a TestBinaryDecoder class to try it out (full code at the end). Let's first see how the exception comes from 5 bytes all having their sign bit set to 1:

    try {
        System.out.println(new TestBinaryDecoder(-1, -1, -1, -1, -1).readInt());
    } catch (IOException ioe) {
        System.out.println(ioe);
    }

Output:

 ovv.so.binary.misc.InvalidNumberEncodingException: Invalid int encoding

Also as you said, 127 poses no problem:

    System.out.println(new TestBinaryDecoder(127, -1, -1, -1, -1).readInt());
 127

The interesting part comes when we put more bytes in holding bits of the int that we want. Here the first byte has a sign bit of 1, the next has 0, so I expect those two bytes to be used:

    System.out.println(new TestBinaryDecoder(255, 127, -1, -1, -1).readInt());
 16383

We are already getting close to the number needed for today's date. Today is 2021-06-04 in my time zone, day 18782 after the epoch, or in binary: 100100101011110. So let's try putting those 15 binary digits into three bytes for the decoder:

    int epochDay = new TestBinaryDecoder(0b11011110, 0b10010010, 0b1, -1, -1).readInt();
    System.out.println(epochDay);
    System.out.println(LocalDate.ofEpochDay(epochDay));
 18782 2021-06-04

So how you got your exception I can't tell. The source surely isn't just a large int value. The problem must be somewhere else.

Full code

public class TestBinaryDecoder {
    
    private byte[] buf;
    private int pos;
    
    /** Convenience constructor */
    public TestBinaryDecoder(int... buf) {
        this(toByteArray(buf));
    }
    
    private static byte[] toByteArray(int[] intArray) {
        byte[] byteArray = new byte[intArray.length];
        IntStream.range(0, intArray.length).forEach(ix -> byteArray[ix] = (byte) intArray[ix]);
        return byteArray;
    }

    public TestBinaryDecoder(byte[] buf) {
        this.buf = buf;
        pos = 0;
    }

    public int readInt() throws IOException {
        this.ensureBounds(5);
        int len = 1;
        int b = this.buf[this.pos] & 255;
        int n = b & 127;
        if (b > 127) {
            b = this.buf[this.pos + len++] & 255;
            n ^= (b & 127) << 7;
            if (b > 127) {
                b = this.buf[this.pos + len++] & 255;
                n ^= (b & 127) << 14;
                if (b > 127) {
                    b = this.buf[this.pos + len++] & 255;
                    n ^= (b & 127) << 21;
                    if (b > 127) {
                        b = this.buf[this.pos + len++] & 255;
                        n ^= (b & 127) << 28;
                        if (b > 127) {
                            throw new InvalidNumberEncodingException("Invalid int encoding");
                        }
                    }
                }
            }
        }
        return n;
    }
    
    private void ensureBounds(int bounds) {
        System.out.println("Ensuring bounds " + bounds);
    }

}

class InvalidNumberEncodingException extends IOException {

    public InvalidNumberEncodingException(String message) {
        super(message);
    }
    
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM