简体   繁体   English

反序列化 avro 很慢

[英]Deserializing avro is slow

I try to do a performance test with Java between several serialization formats including avro/protobuf/thrift and etc.我尝试使用 Java 在几种序列化格式(包括 avro/protobuf/thrift 等)之间进行性能测试。

Test bases on deserializing a byte array message having 30 long type fields for 1,000,000 times.测试基于反序列化具有 30 个长类型字段的字节数组消息 1,000,000 次。 The result for avro is not good. avro 的结果并不好。

protobuf/thrift uses around 2000 milliseconds in average, but it takes 9000 milliseconds for avro. protobuf/thrift 平均使用大约 2000 毫秒,但 avro 需要 9000 毫秒。

In the document it advice to reuse decoder, so I do the code as follow.在文档中建议重用解码器,所以我按如下方式执行代码。

byte[] bytes = readFromFile("market.avro");
long begin = System.nanoTime();
DatumReader<Market> userDatumReader = new ReflectDatumReader<>(Market.class);
InputStream inputStream = new SeekableByteArrayInput(bytes);
BinaryDecoder reuse = DecoderFactory.get().binaryDecoder(inputStream, null);
Market marketReuse = new Market();
for (int i = 0; i < loopCount; i++) {
    inputStream = new SeekableByteArrayInput(bytes);
    BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(inputStream, reuse);
    userDatumReader.read(marketReuse, decoder);
}

long end = System.nanoTime() - begin;
System.out.println("avro loop " + loopCount + " times: " + (end * 1d / 1000 / 1000));

I think avro should not be that slow, so I believe I do something wrong, but I am not sure what's the point.我认为 avro 不应该那么慢,所以我相信我做错了什么,但我不确定有什么意义。 Do I make the 'reuse' in a wrong way?我是否以错误的方式进行“重用”?

Is there any advice for avro performance testing?对 avro 性能测试有什么建议吗? Thanks in advance.提前致谢。

Took me a while to figure this one out.我花了一段时间才弄清楚这个。 But apparently但显然

DecoderFactory.get().binaryDecoder is the culprit - it creates a buffer of 8KB every time it is invoked. DecoderFactory.get().binaryDecoder是罪魁祸首——每次调用它都会创建一个 8KB 的缓冲区。 And this buffer is not re-used, but reallocated on every invocation.并且此缓冲区不会重复使用,而是在每次调用时重新分配。 I don't see any reason why there is a buffer involved in the first place.我看不出有任何理由首先涉及缓冲区。

The saner alternative is to use DecoderFactory.get().directBinaryDecoder更明智的选择是使用DecoderFactory.get().directBinaryDecoder

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将字节缓冲区反序列化为 Avro - Deserializing Byte Buffer to Avro 反序列化 Avro 序列化的 Kafka 流的问题 - Problems deserializing an Avro serialized Kafka Stream Avro 消息 - InvalidNumberEncodingException 反序列化逻辑类型日期 - Avro Message - InvalidNumberEncodingException deserializing logicalType date 如何解决Kafka Avro反序列化问题 - How to solve Kafka Avro deserializing problem SerializationException:反序列化 Avro 消息时出错 (StringIndexOutOfBoundsException) - SerializationException: Error deserializing Avro message (StringIndexOutOfBoundsException) Avro:序列化/反序列化包含Enum值的文件时发生ClassCastException - Avro: ClassCastException while serializing / deserializing a file that contains an Enum value “格式错误的数据。 长度为负数”时反序列化Avro类 - “Malformed data. Length is negative” while deserializing avro class 在 Kafka Consumer 中反序列化 Avro 数据包时出现堆空间问题 - Getting Heap Space Issue while deserializing Avro packet in Kafka Consumer 从kafka流和Avro反序列化同一类的ClassCastException - ClassCastException on the same class deserializing from kafka stream and Avro 使用 Map 反序列化 Avro 中的对象<String,Object>字段返回具有错误类的值 - Deserializing objects in Avro with Map<String,Object> field returns values with wrong class
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM