简体   繁体   English

使用kafka进行火花流传输一位消费者正在读取数据

[英]spark streaming with kafka one consumer is reading the data

I am using the spark steaming using the kafka i have a topic with 20 partitions. 我正在使用kafka进行火花蒸煮,我有一个包含20个分区的主题。 When streaming job runs only one consumer is reading the data from all the topics which leads to slow in reading the data. 当流作业运行时,只有一个使用者从所有主题中读取数据,这导致读取数据的速度变慢。 Is there a way can we configure one consumer per partion in spark steaming. 有没有办法在火花蒸汽中为每个分区配置一个消费者。

JavaStreamingContext jsc = AnalyticsContext.getInstance().getSparkStreamContext();
Map<String, Object> kafkaParams = MessageSessionFactory.getConsumerConfigParamsMap(MessageSessionFactory.DEFAULT_CLUSTER_IDENTITY, consumerGroup);

String[] topics = topic.split(",");
Collection<String> topicCollection = Arrays.asList(topics);
metricStream = KafkaUtils.createDirectStream(
                            jsc,
                            LocationStrategies.PreferConsistent(),
                            ConsumerStrategies.Subscribe(topicCollection, kafkaParams)
);
}

TOPIC             PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG             CONSUMER-ID                                     HOST            CLIENT-ID
metric_data_spark 16         3379403197      3379436869      33672           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 7          3399030625      3399065857      35232           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 13         3389008901      3389044210      35309           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 17         3380638947      3380639928      981             consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 1          3593201424      3593236844      35420           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 8          3394218406      3394252084      33678           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 19         3376897309      3376917998      20689           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 3          3447204634      3447240071      35437           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 18         3375082623      3375083663      1040            consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 2          3433294129      3433327970      33841           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 9          3396324976      3396345705      20729           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 0          3582591157      3582624892      33735           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 14         3381779702      3381813477      33775           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 4          3412492002      3412525779      33777           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 11         3393158700      3393179419      20719           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 10         3392216079      3392235071      18992           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 15         3383001380      3383036803      35423           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 6          3398338540      3398372367      33827           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 12         3387738477      3387772279      33802           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2
metric_data_spark 5          3408698217      3408733614      35397           consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx    consumer-2

What changes we need to do make one consumer/per partition to read the data. 我们需要做的更改使每个使用者/每个分区读取数据。

Since you are using the consistent placement strategy, it should distribute over executors 由于您使用的是一致的展示位置策略,因此应将其分配给执行者

When you run a Spark submit, you need to specify that you want at most 20 executors to be started. 运行Spark提交时,需要指定最多要启动20个执行程序。 --num-executors 20

If you do more than that, though, you'll have idle executors not consuming Kafka data (but they might still be able to process other stages) 但是,如果您做得更多,那么您将有闲置的执行程序不使用Kafka数据(但他们仍然可以处理其他阶段)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM