[英]spark streaming with kafka one consumer is reading the data
我正在使用kafka進行火花蒸煮,我有一個包含20個分區的主題。 當流作業運行時,只有一個使用者從所有主題中讀取數據,這導致讀取數據的速度變慢。 有沒有辦法在火花蒸汽中為每個分區配置一個消費者。
JavaStreamingContext jsc = AnalyticsContext.getInstance().getSparkStreamContext();
Map<String, Object> kafkaParams = MessageSessionFactory.getConsumerConfigParamsMap(MessageSessionFactory.DEFAULT_CLUSTER_IDENTITY, consumerGroup);
String[] topics = topic.split(",");
Collection<String> topicCollection = Arrays.asList(topics);
metricStream = KafkaUtils.createDirectStream(
jsc,
LocationStrategies.PreferConsistent(),
ConsumerStrategies.Subscribe(topicCollection, kafkaParams)
);
}
TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
metric_data_spark 16 3379403197 3379436869 33672 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 7 3399030625 3399065857 35232 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 13 3389008901 3389044210 35309 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 17 3380638947 3380639928 981 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 1 3593201424 3593236844 35420 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 8 3394218406 3394252084 33678 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 19 3376897309 3376917998 20689 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 3 3447204634 3447240071 35437 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 18 3375082623 3375083663 1040 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 2 3433294129 3433327970 33841 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 9 3396324976 3396345705 20729 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 0 3582591157 3582624892 33735 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 14 3381779702 3381813477 33775 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 4 3412492002 3412525779 33777 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 11 3393158700 3393179419 20719 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 10 3392216079 3392235071 18992 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 15 3383001380 3383036803 35423 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 6 3398338540 3398372367 33827 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 12 3387738477 3387772279 33802 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
metric_data_spark 5 3408698217 3408733614 35397 consumer-2-da278f31-c368-414c-925b-d3ca4881709e /xx.xx.xx.xx consumer-2
我們需要做的更改使每個使用者/每個分區讀取數據。
由於您使用的是一致的展示位置策略,因此應將其分配給執行者
運行Spark提交時,需要指定最多要啟動20個執行程序。 --num-executors 20
但是,如果您做得更多,那么您將有閑置的執行程序不使用Kafka數據(但他們仍然可以處理其他階段)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.