简体   繁体   English

spark stream kafka:提取主题分区数据的未知错误

[英]spark streaming kafka : Unknown error fetching data for topic-partition

I'm trying to read a Kafka topic from a Spark cluster using Structured Streaming API with Kafka integration in Spark 我正在尝试使用结构化流API和Spark中的Kafka集成从Spark集群中读取Kafka主题

val sparkSession = SparkSession.builder()
  .master("local[*]")
  .appName("some-app")
  .getOrCreate()

Kafka stream creation Kafka流创建

import sparkSession.implicits._

val dataFrame = sparkSession
  .readStream
  .format("kafka")
  .option("subscribepattern", "preprod-*")
  .option("kafka.bootstrap.servers", "<brokerUrl>:9094")
  .option("kafka.ssl.protocol", "TLS")
  .option("kafka.security.protocol", "SSL")
  .option("kafka.ssl.key.password", secretPassword)
  .option("kafka.ssl.keystore.location", "/tmp/xyz.jks")
  .option("kafka.ssl.keystore.password", secretPassword)
  .option("kafka.ssl.truststore.location", "/abc.jks")
  .option("kafka.ssl.truststore.password", secretPassword)
  .load()
  .selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
  .as[(String, String)]
  .writeStream
  .format("console")
  .start()
  .awaitTermination()

running it using the command 使用命令运行它

/usr/local/spark/bin/spark-submit 
--packages "org.apache.spark:spark-streaming-kafka-0-10_2.11:2.3.1,org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1"
myjar.jar

Getting the below error 得到以下错误

    2018-09-28 07:29:23 INFO  AbstractCoordinator:505 - Discovered coordinator brokerUrl.com:32400 (id: 2147483647 rack: null) for group spark-kafka-source-c72dcb79-f3bc-4dfd-86a5-9d14be48fa04-1188588017-executor.
2018-09-28 07:29:23 INFO  AbstractCoordinator:505 - Discovered coordinator brokerUrl.com:32400 (id: 2147483647 rack: null) for group spark-kafka-source-c72dcb79-f3bc-4dfd-86a5-9d14be48fa04-1188588017-executor.
2018-09-28 07:29:23 INFO  AbstractCoordinator:505 - Discovered coordinator brokerUrl.com:32400 (id: 2147483647 rack: null) for group spark-kafka-source-c72dcb79-f3bc-4dfd-86a5-9d14be48fa04-1188588017-executor.
2018-09-28 07:29:23 INFO  AbstractCoordinator:505 - Discovered coordinator brokerUrl.com:32400 (id: 2147483647 rack: null) for group spark-kafka-source-c72dcb79-f3bc-4dfd-86a5-9d14be48fa04-1188588017-executor.
2018-09-28 07:29:47 WARN  Fetcher:594 - Unknown error fetching data for topic-partition preprod-sanity-test-5
2018-09-28 07:30:25 WARN  Fetcher:594 - Unknown error fetching data for topic-partition preprod-sanity-test-7
2018-09-28 07:30:27 WARN  Fetcher:594 - Unknown error fetching data for topic-partition preprod-sanity-test-7
2018-09-28 07:30:27 WARN  Fetcher:594 - Unknown error fetching data for topic-partition preprod-sanity-test-5
2018-09-28 07:30:50 WARN  Fetcher:594 - Unknown error fetching data for topic-partition preprod-sanity-test-8
2018-09-28 07:30:50 WARN  Fetcher:594 - Unknown error fetching data for topic-partition preprod-sanity-test-4
2018-09-28 07:30:50 WARN  Fetcher:594 - Unknown error fetching data for topic-partition preprod-sanity-test-7
2018-09-28 07:30:50 WARN  Fetcher:594 - Unknown error fetching data for topic-partition preprod-sanity-test-8
2018-09-28 07:30:50 WARN  Fetcher:594 - Unknown error fetching data for topic-partition preprod-sanity-test-4
2018-09-28 07:30:50 WARN  Fetcher:594 - Unknown error fetching data for topic-partition preprod-sanity-test-5
.....
....
so on

What's your Kafka broker version? 您的Kafka经纪人版本是什么? And how did you generate these messages? 您如何生成这些消息?

If these messages have headers ( https://issues.apache.org/jira/browse/KAFKA-4208 ), you will need to use Kafka 0.11+ to consume them as old Kafka client cannot read such messages. 如果这些消息具有标头( https://issues.apache.org/jira/browse/KAFKA-4208 ),则您将需要使用Kafka 0.11+来使用它们,因为旧的Kafka客户端无法读取此类消息。 If so, you can use the following command: 如果是这样,可以使用以下命令:

/usr/local/spark/bin/spark-submit --packages "org.apache.kafka:kafka-clients:0.11.0.3,org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.1"
myjar.jar

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark Streaming 将数据写入 Kafka 主题 - Spark Streaming Writing Data To Kafka Topic Spark Streaming - 写入 Kafka 主题 - Spark Streaming - write to Kafka topic Kafka主题分区为Spark流媒体 - Kafka topic partitions to Spark streaming 错误:使用 Spark Structured Streaming 读取和写入数据到 kafka 中的另一个主题 - Error: Using Spark Structured Streaming to read and write data to another topic in kafka 使用 spark-streaming 将数据发布到 kafka 主题时出现重复 - Duplicates while publishing data to kafka topic using spark-streaming Spark Streaming作业如何在Kafka主题上发送数据并将其保存在Elastic中 - Spark Streaming job how to send data on Kafka topic and saving it in Elastic 将聚合结果发送到 Kafka 主题时 Spark 结构化流错误 - Spark Structured Streaming Error while sending aggregated result to Kafka Topic 当使用Spark Streaming将消息写入kafka主题时,它只是写入一个分区 - When am writing the messages into a kafka topic using spark streaming it is just writing into one partition Spark Structred Streaming Kafka-如何从主题的特定分区读取以及进行偏移管理 - Spark Structred Streaming Kafka - how to read from a specific partition of topic and do offset managerment Kafka主题分区和Spark执行器映射 - Kafka topic partition and Spark executor mapping
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM