简体   繁体   English

SparkContext:运行 Spark 作业时初始化 SparkContext 时出错

[英]SparkContext: Error initializing SparkContext While Running Spark Job

I'm doing a Spark program that loads data from Elastic Search to HDFS but I am getting Error initializing SparkContext.我正在执行将数据从 Elastic Search 加载到 HDFS 的 Spark 程序,但在初始化 SparkContext 时出现错误。 error while running the job.运行作业时出错。 The error is during making spark session.错误是在制作 spark 会话期间。

Hadoop: 3.2.1 Hadoop:3.2.1

Spark: 2.4.4火花:2.4.4

Elasticsearch Spark (for Spark 2.X) » 7.5.1 Elasticsearch Spark(用于 Spark 2.X)» 7.5.1

EMR: 6.0.0电子病历:6.0.0

Code:代码:

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, to_date

spark = SparkSession \
   .builder \
   .getOrCreate()

Error Log:错误日志:

20/10/22 10:09:12 ERROR SparkContext: Error initializing SparkContext.
java.util.ServiceConfigurationError: org.apache.spark.deploy.yarn.security.ServiceCredentialProvider: Provider org.elasticsearch.spark.deploy.yarn.security.EsServiceCredentialProvider could not be instantiated
    at java.util.ServiceLoader.fail(ServiceLoader.java:232)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
    at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
    at scala.collection.Iterator.foreach(Iterator.scala:941)
    at scala.collection.Iterator.foreach$(Iterator.scala:941)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
    at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
    at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
    at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
    at scala.collection.TraversableLike.to(TraversableLike.scala:678)
    at scala.collection.TraversableLike.to$(TraversableLike.scala:675)
    at scala.collection.AbstractTraversable.to(Traversable.scala:108)
    at scala.collection.TraversableOnce.toList(TraversableOnce.scala:299)
    at scala.collection.TraversableOnce.toList$(TraversableOnce.scala:299)
    at scala.collection.AbstractTraversable.toList(Traversable.scala:108)
    at org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager.loadCredentialProviders(YARNHadoopDelegationTokenManager.scala:82)
    at org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager.getCredentialProviders(YARNHadoopDelegationTokenManager.scala:73)
    at org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager.<init>(YARNHadoopDelegationTokenManager.scala:46)
    at org.apache.spark.deploy.yarn.Client.setupSecurityToken(Client.scala:308)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:1013)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:178)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:183)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/security/ServiceCredentialProvider$class
    at org.elasticsearch.spark.deploy.yarn.security.EsServiceCredentialProvider.<init>(EsServiceCredentialProvider.scala:63)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.lang.Class.newInstance(Class.java:442)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
    ... 40 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.security.ServiceCredentialProvider$class
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
    ... 47 more
20/10/22 10:09:12 INFO SparkUI: Stopped Spark web UI at http://ip-172-31-1-155.us-east-2.test:4040
20/10/22 10:09:12 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
20/10/22 10:09:12 INFO YarnClientSchedulerBackend: Stopped
20/10/22 10:09:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/10/22 10:09:12 INFO MemoryStore: MemoryStore cleared
20/10/22 10:09:12 INFO BlockManager: BlockManager stopped
20/10/22 10:09:12 INFO BlockManagerMaster: BlockManagerMaster stopped
20/10/22 10:09:12 WARN MetricsSystem: Stopping a MetricsSystem that is not running
20/10/22 10:09:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/10/22 10:09:12 INFO SparkContext: Successfully stopped SparkContext
20/10/22 10:09:13 INFO ShutdownHookManager: Shutdown hook called
20/10/22 10:09:13 INFO ShutdownHookManager: Deleting directory /vol1/tmp/spark-b39bb8cc-5bc7-4721-89bd-8bd62b9e527e
20/10/22 10:09:13 INFO ShutdownHookManager: Deleting directory /vol1/tmp/spark-d94995f0-05b6-476f-935e-8ba501acbed3

    at com.company.utils.ResourceScriptUtils.executeScript(ResourceScriptUtils.java:114)
    at com.company.utils.ResourceScriptUtils.executeScript(ResourceScriptUtils.java:135)
    at com.company.loader.impl.realTimeProcessing.RealTimeEsLoader.processJob(RealTimeEsLoader.java:232)
    at com.company.loader.App.main(App.java:37)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
    at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
    at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
    at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153)
    at com.company.multijob.MultiJob$HadoopJob.call(MultiJob.java:50)
    at com.company.multijob.MultiJob$HadoopJob.call(MultiJob.java:38)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Could someone please help?有人可以帮忙吗? Thanks.谢谢。

This is coming due to Spark-Yarn JAR's missing from your spark application.这是因为您的 Spark 应用程序中缺少 Spark-Yarn JAR。 Add below in your pom.xml if you are using maven.如果您使用的是 maven,请在 pom.xml 中添加以下内容。

  1. Under dependencies:在依赖项下:
    <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-yarn_2.11</artifactId>
                <version>2.4.7</version>
            </dependency>
  1. Under artifactItems:在 artifactItems 下:
    <artifactItem>
                            <groupId>org.apache.spark</groupId>
                            <artifactId>spark-yarn_2.11</artifactId>
                            <version>2.4.7</version>
                            <type>jar</type>
                            <overWrite>false</overWrite>
                            <outputDirectory>${project.build.directory}/classes/</outputDirectory>
                            <destFileName>optional-new-name.jar</destFileName>
                        </artifactItem>

Note: Change your scala and spark version accordingly as here I'm using Spark 2.4.7 and Scala 2.11.注意:相应地更改您的 Scala 和 Spark 版本,因为这里我使用的是 Spark 2.4.7 和 Scala 2.11。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM