简体   繁体   中英

Batch Size Problem with MapR Streams Kafka API

Hello i am using Kafka MapRStream to recieve Events from a Mapr Streams Topic.

I am trying to increase the batch size of my consumer but i am not getting more than 30 messages in one batch !

A single event is about 5000 bytes in size. If the event is smaller I get more in one batch.

Here is my configuration of the Consumer:

public static void main( String[] args ) {
        final Properties props = new Properties();
        props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, "");
        props.put(ConsumerConfig.GROUP_ID_CONFIG, "batchSize");
        props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
        props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class);
        props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
        props.put(ConsumerConfig.FETCH_MIN_BYTES_CONFIG, 50000);
        props.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG, 26214400);
        props.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, 100 * 1024 * 1024);
        props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 1000);


        Consumer<String, String> consumer = new KafkaConsumer<>(props);
        consumer.subscribe(Collections.singletonList(TOPIC));
        long totalCount = 0;
        long start = System.currentTimeMillis();
        long countTimesNoMessages = 0;

        while (countTimesNoMessages < 10) {
            ConsumerRecords<String, String> records = consumer.poll(1000);
            totalCount += records.count();
            System.out.println(records.count());
            if (records.count() == 0) {
                countTimesNoMessages++;
            }
        }

        long end = System.currentTimeMillis();
        System.out.println((end - start) + " for " + totalCount + " messages");
    } 

These are the possible configuration points.

https://mapr.com/docs/61/MapR_Streams/configuration-parameters.html

notice that fetch.max.bytes is the total max and sum(max.partition.fetch.bytes) over all partitions cannot go over fetch.max.bytes.

It is normal to adjust max.partition.fetch.bytes so more than 64Kb (default) are poll from each partition and also will adjust fetch.max.bytes so that it allows max.partition.fetch.bytes to work properly.

You shouldn't probably set the batch size too large. Once the frequency of requests to poll the stream drops below several hundred per second or so, you are very unlikely to get additional improvements in performance and are much more likely to have problems with hotspots or large amounts of redone work in the event of failed threads.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM