[英]Avro Message - InvalidNumberEncodingException deserializing logicalType date
I have an exception when I deserialize a message with a field defined as logicalType date.当我反序列化具有定义为逻辑类型日期的字段的消息时出现异常。 As documentation, the field is defined as:
作为文档,该字段定义为:
{"name": "startDate", "type": {"type": "int", "logicalType": "date"}}
I use "avro-maven-plugin" (1.9.2) to generate the java classes and I can set the field startDate to java.time.LocalDate.now()
;我使用“avro-maven-plugin”(1.9.2)生成 java 类,我可以将字段 startDate 设置为
java.time.LocalDate.now()
; the avro object is serialize the message and send it to a kafka topic. avro object 将消息序列化并将其发送到 kafka 主题。 So far, everything is good.
到目前为止,一切都很好。
However, when I read the message I get the exception:但是,当我阅读该消息时,我得到了异常:
Caused by: org.apache.avro.InvalidNumberEncodingException: Invalid int encoding
at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:166)
at org.apache.avro.io.ValidatingDecoder.readInt(ValidatingDecoder.java:83)
at org.apache.avro.generic.GenericDatumReader.readInt(GenericDatumReader.java:551)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:195)
at org.apache.avro.generic.GenericDatumReader.readWithConversion(GenericDatumReader.java:173)
at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:134)
at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
What makes everything even more weird is that no error occurs if I set a different date like LocalDate.of(1970, 1, 1)
.让一切变得更加奇怪的是,如果我设置一个不同的日期,比如
LocalDate.of(1970, 1, 1)
,就不会发生错误。
In other words, if the serialized int value representing the number of day since 01/01/1970 is small enough, everything works fine.换句话说,如果表示自 01/01/1970 以来的天数的序列化 int 值足够小,则一切正常。 I tried that test after having a look of the code that raise the exception, it made me think that if the int day is lower that 127 the error could be avoided:
在查看引发异常的代码后,我尝试了该测试,这让我认为如果 int day 低于 127,则可以避免错误:
public int readInt() throws IOException {
this.ensureBounds(5);
int len = 1;
int b = this.buf[this.pos] & 255;
int n = b & 127;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 7;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 14;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 21;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 28;
if (b > 127) {
throw new InvalidNumberEncodingException("Invalid int encoding");
}
}
}
}
}
....
Of course I can't use in production only date close to 01/01/1970.当然,我不能只在生产中使用接近 01/01/1970 的日期。 Any help is welcome:-)
欢迎任何帮助:-)
The code that you have posted can deserialize numbers not only up to 127, but the full range of Java int
, so up to a couple of billion corresponding to dates several million years after 1970.您发布的代码不仅可以反序列化最多 127 的数字,还可以反序列化 Java
int
的全部范围,因此对应于 1970 年之后几百万年的日期最多可达几十亿。
The BinaryDecoder.readInt
method from Apache Avro deserializes from 1 through 5 bytes into a Java int
. Apache Avro 中的
BinaryDecoder.readInt
方法将 1 到 5 个字节反序列化为 Java int
。 It uses the last 7 bits from each byte for the int
, only not the sign bit.它使用
int
每个字节的最后 7 位,而不是符号位。 Instead the sign bit is used for determining how many bytes to read.相反,符号位用于确定要读取的字节数。 A sign bit of 0 means this is the last byte.
符号位 0 表示这是最后一个字节。 A sign bit of 1 means there are more bytes after this one.
符号位 1 表示在此之后还有更多字节。 The exception is thrown in case 5 bytes are read and they all had their sign bits set to 1. 5 bytes can supply 35 bits, and an
int
can hold 32 bits, so regarding more than 5 bytes as an error is fair.如果读取了 5 个字节并且它们的符号位都设置为 1,则会引发异常。5 个字节可以提供 35 位,而
int
可以容纳 32 位,因此将超过 5 个字节视为错误是公平的。
So from the code that you have posted no dates that I would reasonably expect to use in an application will pose any problems.因此,从您发布的代码中,没有我合理期望在应用程序中使用的日期会造成任何问题。
I put your method in a TestBinaryDecoder
class to try it out (full code at the end).我将您的方法放在
TestBinaryDecoder
class 中进行尝试(最后的完整代码)。 Let's first see how the exception comes from 5 bytes all having their sign bit set to 1:让我们首先看看异常是如何来自 5 个字节的符号位都设置为 1 的:
try {
System.out.println(new TestBinaryDecoder(-1, -1, -1, -1, -1).readInt());
} catch (IOException ioe) {
System.out.println(ioe);
}
Output: Output:
ovv.so.binary.misc.InvalidNumberEncodingException: Invalid int encoding
Also as you said, 127 poses no problem:同样如您所说, 127 没有问题:
System.out.println(new TestBinaryDecoder(127, -1, -1, -1, -1).readInt());
127
The interesting part comes when we put more bytes in holding bits of the int
that we want.有趣的部分是当我们将更多字节用于保存我们想要的
int
位时。 Here the first byte has a sign bit of 1, the next has 0, so I expect those two bytes to be used:这里第一个字节的符号位为 1,下一个字节为 0,所以我希望使用这两个字节:
System.out.println(new TestBinaryDecoder(255, 127, -1, -1, -1).readInt());
16383
We are already getting close to the number needed for today's date.我们已经接近今天所需的人数。 Today is 2021-06-04 in my time zone, day 18782 after the epoch, or in binary: 100100101011110. So let's try putting those 15 binary digits into three bytes for the decoder:
今天是我所在时区的 2021 年 6 月 4 日,纪元之后的第 18782 天,或者二进制:100100101011110。所以让我们尝试将这 15 个二进制数字放入解码器的三个字节中:
int epochDay = new TestBinaryDecoder(0b11011110, 0b10010010, 0b1, -1, -1).readInt();
System.out.println(epochDay);
System.out.println(LocalDate.ofEpochDay(epochDay));
18782 2021-06-04
So how you got your exception I can't tell.所以你是怎么得到你的例外的,我不知道。 The source surely isn't just a large
int
value.源肯定不仅仅是一个大的
int
值。 The problem must be somewhere else.问题一定出在其他地方。
public class TestBinaryDecoder {
private byte[] buf;
private int pos;
/** Convenience constructor */
public TestBinaryDecoder(int... buf) {
this(toByteArray(buf));
}
private static byte[] toByteArray(int[] intArray) {
byte[] byteArray = new byte[intArray.length];
IntStream.range(0, intArray.length).forEach(ix -> byteArray[ix] = (byte) intArray[ix]);
return byteArray;
}
public TestBinaryDecoder(byte[] buf) {
this.buf = buf;
pos = 0;
}
public int readInt() throws IOException {
this.ensureBounds(5);
int len = 1;
int b = this.buf[this.pos] & 255;
int n = b & 127;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 7;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 14;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 21;
if (b > 127) {
b = this.buf[this.pos + len++] & 255;
n ^= (b & 127) << 28;
if (b > 127) {
throw new InvalidNumberEncodingException("Invalid int encoding");
}
}
}
}
}
return n;
}
private void ensureBounds(int bounds) {
System.out.println("Ensuring bounds " + bounds);
}
}
class InvalidNumberEncodingException extends IOException {
public InvalidNumberEncodingException(String message) {
super(message);
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.