[英]Spark Job with Kafka on Kubernetes
We have a Spark Java application which reads from database and publishes messages on Kafka.我们有一个 Spark Java 应用程序,它从数据库读取并在 Kafka 上发布消息。 When we execute the job locally on windows command line with following arguments it is working as expected :当我们使用以下参数在 Windows 命令行上本地执行作业时,它按预期工作:
bin/spark-submit -class com.data.ingestion.DataIngestion --jars local:///opt/spark/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar local:///opt/spark/jars/data-ingestion-1.0-SNAPSHOT.jar
spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.0 --class com.data.ingestion.DataIngestion data-ingestion-1.0-SNAPSHOT.jar
Similarly, when try to run the command using k8s master同样,当尝试使用 k8s master 运行命令时
bin/spark-submit --master k8s://https://172.16.3.105:8443 --deploy-mode cluster --conf spark.kubernetes.container.image=localhost:5000/spark-example:0.2 --class com.data.ingestion.DataIngestion --jars local:///opt/spark/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar local:///opt/spark/jars/data-ingestion-1.0-SNAPSHOT.jar
It gives following error:它给出以下错误:
Exception in thread "main" java.util.ServiceConfigurationError:
org.apache.spark.sql.sources.DataSourceRegister: Provider
org.apache.spark.sql.kafka010.KafkaSourceProvider could not be instantiated
Based on the error, it would indicate at least one node in the cluster does not have /opt/spark/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar
根据错误,它会表明集群中至少有一个节点没有/opt/spark/jars/spark-sql-kafka-0-10_2.11-2.3.0.jar
I suggest you create an uber jar that includes this Kafka Structured Streaming package or use --packages
rather than local files in addition to setup a solution like Rook or MinIO to have a shared filesystem within k8s/spark我建议你创建一个包含这个 Kafka Structured Streaming 包的 uber jar 或者使用--packages
而不是本地文件,此外还设置一个像 Rook 或 MinIO 这样的解决方案在 k8s/spark 中拥有一个共享文件系统
似乎 Scala 版本和 Spark Kafka 版本没有对齐。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.