简体   繁体   English

Avro 消息 - InvalidNumberEncodingException 反序列化逻辑类型日期

[英]Avro Message - InvalidNumberEncodingException deserializing logicalType date

I have an exception when I deserialize a message with a field defined as logicalType date.当我反序列化具有定义为逻辑类型日期的字段的消息时出现异常。 As documentation, the field is defined as:作为文档,该字段定义为:

{"name": "startDate", "type": {"type": "int", "logicalType": "date"}}

I use "avro-maven-plugin" (1.9.2) to generate the java classes and I can set the field startDate to java.time.LocalDate.now() ;我使用“avro-maven-plugin”(1.9.2)生成 java 类,我可以将字段 startDate 设置为java.time.LocalDate.now() the avro object is serialize the message and send it to a kafka topic. avro object 将消息序列化并将其发送到 kafka 主题。 So far, everything is good.到目前为止,一切都很好。

However, when I read the message I get the exception:但是,当我阅读该消息时,我得到了异常:

Caused by: org.apache.avro.InvalidNumberEncodingException: Invalid int encoding
    at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:166)
    at org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
    at org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:551)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:195)
    at org.apache.avro.generic.GenericDatumReader.readWithConversion(GenericDatumReader.java:173)
    at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:134)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
    at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)

What makes everything even more weird is that no error occurs if I set a different date like LocalDate.of(1970, 1, 1) .让一切变得更加奇怪的是,如果我设置一个不同的日期,比如LocalDate.of(1970, 1, 1) ,就不会发生错误。

In other words, if the serialized int value representing the number of day since 01/01/1970 is small enough, everything works fine.换句话说,如果表示自 01/01/1970 以来的天数的序列化 int 值足够小,则一切正常。 I tried that test after having a look of the code that raise the exception, it made me think that if the int day is lower that 127 the error could be avoided:在查看引发异常的代码后,我尝试了该测试,这让我认为如果 int day 低于 127,则可以避免错误:

   public int readInt() throws IOException {
        this.ensureBounds(5);
        int len = 1;
        int b = this.buf[this.pos] & 255;
        int n = b & 127;
        if (b > 127) {
            b = this.buf[this.pos + len++] & 255;
            n ^= (b & 127) << 7;
            if (b > 127) {
                b = this.buf[this.pos + len++] & 255;
                n ^= (b & 127) << 14;
                if (b > 127) {
                    b = this.buf[this.pos + len++] & 255;
                    n ^= (b & 127) << 21;
                    if (b > 127) {
                        b = this.buf[this.pos + len++] & 255;
                        n ^= (b & 127) << 28;
                        if (b > 127) {
                            throw new InvalidNumberEncodingException("Invalid int encoding");
                        }
                    }
                }
            }
        }
....

Of course I can't use in production only date close to 01/01/1970.当然,我不能只在生产中使用接近 01/01/1970 的日期。 Any help is welcome:-)欢迎任何帮助:-)

TL:DR TL:博士

The code that you have posted can deserialize numbers not only up to 127, but the full range of Java int , so up to a couple of billion corresponding to dates several million years after 1970.您发布的代码不仅可以反序列化最多 127 的数字,还可以反序列化 Java int的全部范围,因此对应于 1970 年之后几百万年的日期最多可达几十亿。

Details细节

The BinaryDecoder.readInt method from Apache Avro deserializes from 1 through 5 bytes into a Java int . Apache Avro 中的BinaryDecoder.readInt方法将 1 到 5 个字节反序列化为 Java int It uses the last 7 bits from each byte for the int , only not the sign bit.它使用int每个字节的最后 7 位,而不是符号位。 Instead the sign bit is used for determining how many bytes to read.相反,符号位用于确定要读取的字节数。 A sign bit of 0 means this is the last byte.符号位 0 表示这是最后一个字节。 A sign bit of 1 means there are more bytes after this one.符号位 1 表示在此之后还有更多字节。 The exception is thrown in case 5 bytes are read and they all had their sign bits set to 1. 5 bytes can supply 35 bits, and an int can hold 32 bits, so regarding more than 5 bytes as an error is fair.如果读取了 5 个字节并且它们的符号位设置为 1,则会引发异常。5 个字节可以提供 35 位,而int可以容纳 32 位,因此将超过 5 个字节视为错误是公平的。

So from the code that you have posted no dates that I would reasonably expect to use in an application will pose any problems.因此,从您发布的代码中,没有我合理期望在应用程序中使用的日期会造成任何问题。

Test code测试代码

I put your method in a TestBinaryDecoder class to try it out (full code at the end).我将您的方法放在TestBinaryDecoder class 中进行尝试(最后的完整代码)。 Let's first see how the exception comes from 5 bytes all having their sign bit set to 1:让我们首先看看异常是如何来自 5 个字节的符号位都设置为 1 的:

    try {
        System.out.println(new TestBinaryDecoder(-1, -1, -1, -1, -1).readInt());
    } catch (IOException ioe) {
        System.out.println(ioe);
    }

Output: Output:

 ovv.so.binary.misc.InvalidNumberEncodingException: Invalid int encoding

Also as you said, 127 poses no problem:同样如您所说, 127 没有问题:

    System.out.println(new TestBinaryDecoder(127, -1, -1, -1, -1).readInt());
 127

The interesting part comes when we put more bytes in holding bits of the int that we want.有趣的部分是当我们将更多字节用于保存我们想要的int位时。 Here the first byte has a sign bit of 1, the next has 0, so I expect those two bytes to be used:这里第一个字节的符号位为 1,下一个字节为 0,所以我希望使用这两个字节:

    System.out.println(new TestBinaryDecoder(255, 127, -1, -1, -1).readInt());
 16383

We are already getting close to the number needed for today's date.我们已经接近今天所需的人数。 Today is 2021-06-04 in my time zone, day 18782 after the epoch, or in binary: 100100101011110. So let's try putting those 15 binary digits into three bytes for the decoder:今天是我所在时区的 2021 年 6 月 4 日,纪元之后的第 18782 天,或者二进制:100100101011110。所以让我们尝试将这 15 个二进制数字放入解码器的三个字节中:

    int epochDay = new TestBinaryDecoder(0b11011110, 0b10010010, 0b1, -1, -1).readInt();
    System.out.println(epochDay);
    System.out.println(LocalDate.ofEpochDay(epochDay));
 18782 2021-06-04

So how you got your exception I can't tell.所以你是怎么得到你的例外的,我不知道。 The source surely isn't just a large int value.源肯定不仅仅是一个大的int值。 The problem must be somewhere else.问题一定出在其他地方。

Full code完整代码

public class TestBinaryDecoder {
    
    private byte[] buf;
    private int pos;
    
    /** Convenience constructor */
    public TestBinaryDecoder(int... buf) {
        this(toByteArray(buf));
    }
    
    private static byte[] toByteArray(int[] intArray) {
        byte[] byteArray = new byte[intArray.length];
        IntStream.range(0, intArray.length).forEach(ix -> byteArray[ix] = (byte) intArray[ix]);
        return byteArray;
    }

    public TestBinaryDecoder(byte[] buf) {
        this.buf = buf;
        pos = 0;
    }

    public int readInt() throws IOException {
        this.ensureBounds(5);
        int len = 1;
        int b = this.buf[this.pos] & 255;
        int n = b & 127;
        if (b > 127) {
            b = this.buf[this.pos + len++] & 255;
            n ^= (b & 127) << 7;
            if (b > 127) {
                b = this.buf[this.pos + len++] & 255;
                n ^= (b & 127) << 14;
                if (b > 127) {
                    b = this.buf[this.pos + len++] & 255;
                    n ^= (b & 127) << 21;
                    if (b > 127) {
                        b = this.buf[this.pos + len++] & 255;
                        n ^= (b & 127) << 28;
                        if (b > 127) {
                            throw new InvalidNumberEncodingException("Invalid int encoding");
                        }
                    }
                }
            }
        }
        return n;
    }
    
    private void ensureBounds(int bounds) {
        System.out.println("Ensuring bounds " + bounds);
    }

}

class InvalidNumberEncodingException extends IOException {

    public InvalidNumberEncodingException(String message) {
        super(message);
    }
    
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM