简体   繁体   中英

Deserializing avro is slow

I try to do a performance test with Java between several serialization formats including avro/protobuf/thrift and etc.

Test bases on deserializing a byte array message having 30 long type fields for 1,000,000 times. The result for avro is not good.

protobuf/thrift uses around 2000 milliseconds in average, but it takes 9000 milliseconds for avro.

In the document it advice to reuse decoder, so I do the code as follow.

byte[] bytes = readFromFile("market.avro");
long begin = System.nanoTime();
DatumReader<Market> userDatumReader = new ReflectDatumReader<>(Market.class);
InputStream inputStream = new SeekableByteArrayInput(bytes);
BinaryDecoder reuse = DecoderFactory.get().binaryDecoder(inputStream, null);
Market marketReuse = new Market();
for (int i = 0; i < loopCount; i++) {
    inputStream = new SeekableByteArrayInput(bytes);
    BinaryDecoder decoder = DecoderFactory.get().binaryDecoder(inputStream, reuse);
    userDatumReader.read(marketReuse, decoder);
}

long end = System.nanoTime() - begin;
System.out.println("avro loop " + loopCount + " times: " + (end * 1d / 1000 / 1000));

I think avro should not be that slow, so I believe I do something wrong, but I am not sure what's the point. Do I make the 'reuse' in a wrong way?

Is there any advice for avro performance testing? Thanks in advance.

Took me a while to figure this one out. But apparently

DecoderFactory.get().binaryDecoder is the culprit - it creates a buffer of 8KB every time it is invoked. And this buffer is not re-used, but reallocated on every invocation. I don't see any reason why there is a buffer involved in the first place.

The saner alternative is to use DecoderFactory.get().directBinaryDecoder

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM