简体   繁体   中英

Producer (java client) has performance drops when message size is very huge (like ~100mb)

1. In my application which sends a data through TCP connection (Kafka Producer), I observed drastic performance drop when the message size gets larger from 1MB to 100MB. (140 MB/sec --> 25 MB/sec) (batch size = 1)

I profiled the producer process and found one suspicious point: a method 'copyFromArray' in Bits.java consumes most of the time. (The codes are as follows.)

static final long UNSAFE_COPY_THRESHOLD = 1024L * 1024L;

static void copyFromArray(Object src, long srcBaseOffset, long srcPos,
                          long dstAddr, long length)
{
    long offset = srcBaseOffset + srcPos;
    while (length > 0) {
        long size = (length > UNSAFE_COPY_THRESHOLD) ? UNSAFE_COPY_THRESHOLD : length;
        unsafe.copyMemory(src, offset, null, dstAddr, size);
        length -= size;
        offset += size;
        dstAddr += size;
    }
}

Reference: http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/nio/Bits.java

2. Interestingly this problem occurs only when I use the producer client (java implementation) but does not occur when I use the one (scala implementation), which I cannot understand.

Where should I start to find what the problem is here?

Kafka's optimal message size is around 1k. If your message size is larger than 10M, you start to suffer performance problem. In your case, the message size is around 100MB. That's definitely a no no.

You have to ask yourself whether sending such big message is necessary. Kafka is a event pub-sub system, not a FTP server. If you need to send large file, you can put the file in a shared location and just send the url as message through Kafka. If this does not work, another workaround is to code your producer to break large messages into multiple pieces with the same key. This way you guarantee the messages with the same key will end up on the same partition. You can assemble the messages back at the consumer side. Also using compression will reduce the size of your message to improve performance.

In short, you should avoid sending large messages (>10M) through Kafka.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM