在 Kubernetes 上使用 Kafka 进行 Spark 作业

Question

We have a Spark Java application which reads from database and publishes messages on Kafka.我们有一个 Spark Java 应用程序，它从数据库读取并在 Kafka 上发布消息。 When we execute the job locally on windows command line with following arguments it is working as expected :当我们使用以下参数在 Windows 命令行上本地执行作业时，它按预期工作：

bin/spark-submit -class com.data.ingestion.DataIngestion --jars  local:///opt/spark/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar local:///opt/spark/jars/data-ingestion-1.0-SNAPSHOT.jar

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0 --class com.data.ingestion.DataIngestion data-ingestion-1.0-SNAPSHOT.jar

Similarly, when try to run the command using k8s master同样，当尝试使用 k8s master 运行命令时

bin/spark-submit --master k8s://https://172.16.3.105:8443 --deploy-mode cluster --conf spark.kubernetes.container.image=localhost:5000/spark-example:0.2 --class com.data.ingestion.DataIngestion --jars  local:///opt/spark/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar local:///opt/spark/jars/data-ingestion-1.0-SNAPSHOT.jar

It gives following error:它给出以下错误：

Exception in thread "main" java.util.ServiceConfigurationError: 
org.apache.spark.sql.sources.DataSourceRegister: Provider 
org.apache.spark.sql.kafka010.KafkaSourceProvider could not be instantiated

Answer 1

Based on the error, it would indicate at least one node in the cluster does not have /opt/spark/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar根据错误，它会表明集群中至少有一个节点没有/opt/spark/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar

I suggest you create an uber jar that includes this Kafka Structured Streaming package or use --packages rather than local files in addition to setup a solution like Rook or MinIO to have a shared filesystem within k8s/spark我建议你创建一个包含这个 Kafka Structured Streaming 包的 uber jar 或者使用--packages而不是本地文件，此外还设置一个像 Rook 或 MinIO 这样的解决方案在 k8s/spark 中拥有一个共享文件系统

Answer 2

似乎 Scala 版本和 Spark Kafka 版本没有对齐。

在 Kubernetes 上使用 Kafka 进行 Spark 作业

问题描述

2 个解决方案

解决方案1
1 2020-02-27 14:17:21

解决方案2
0 2020-02-29 05:18:41

在 Kubernetes 上使用 Kafka 进行 Spark 作业

问题描述

2 个解决方案

解决方案1 1 2020-02-27 14:17:21

解决方案2 0 2020-02-29 05:18:41

解决方案1
1 2020-02-27 14:17:21

解决方案2
0 2020-02-29 05:18:41