简体   繁体   中英

Dataproc; Spark job fails on Dataproc Spark cluster, but runs locally

I have a JAR file generated via a Maven project that works fine when I run it locally via java -jar JARFILENAME.jar. However, when I try to run the same JAR file on Dataproc I get the following error:

22/06/27 13:13:45 INFO org.apache.spark.SparkEnv: Registering BlockManagerMaster
22/06/27 13:13:46 INFO org.apache.spark.SparkEnv: Registering BlockManagerMasterHeartbeat
22/06/27 13:13:46 INFO org.apache.spark.SparkEnv: Registering OutputCommitCoordinator
22/06/27 13:13:49 INFO org.sparkproject.jetty.util.log: Logging initialized @7373ms to org.sparkproject.jetty.util.log.Slf4jLog
22/06/27 13:13:51 INFO com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl: Ignoring exception of type GoogleJsonResponseException; verified object already exists with desired state.
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.catalyst.expressions.aggregate.ApproximatePercentile$PercentileDigest.getPercentiles([D)Lscala/collection/Seq;
    at com.amazon.deequ.analyzers.ApproxQuantile.fromAggregationResult(ApproxQuantile.scala:84)
    at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult(Analyzer.scala:192)
    at com.amazon.deequ.analyzers.ScanShareableAnalyzer.metricFromAggregationResult$(Analyzer.scala:185)
    at com.amazon.deequ.analyzers.ApproxQuantile.metricFromAggregationResult(ApproxQuantile.scala:50)
    at com.amazon.deequ.analyzers.runners.AnalysisRunner$.successOrFailureMetricFrom(AnalysisRunner.scala:362)
    at com.amazon.deequ.analyzers.runners.AnalysisRunner$.$anonfun$runScanningAnalyzers$5(AnalysisRunner.scala:330)
    at scala.collection.immutable.List.map(List.scala:297)
    at com.amazon.deequ.analyzers.runners.AnalysisRunner$.liftedTree1$1(AnalysisRunner.scala:328)
    at com.amazon.deequ.analyzers.runners.AnalysisRunner$.runScanningAnalyzers(AnalysisRunner.scala:318)
    at com.amazon.deequ.analyzers.runners.AnalysisRunner$.doAnalysisRun(AnalysisRunner.scala:167)
    at com.amazon.deequ.VerificationSuite.doVerificationRun(VerificationSuite.scala:121)
    at com.amazon.deequ.VerificationRunBuilder.run(VerificationRunBuilder.scala:173)
    at com.amazon.deequ.thesis.GCTestOne$.$anonfun$main$1(GCTestOne.scala:42)
    at com.amazon.deequ.thesis.GCTestOne$.$anonfun$main$1$adapted(GCTestOne.scala:11)
    at com.amazon.deequ.examples.ExampleUtils$.withSpark(ExampleUtils.scala:32)
    at com.amazon.deequ.thesis.GCTestOne$.main(GCTestOne.scala:11)
    at com.amazon.deequ.thesis.GCTestOne.main(GCTestOne.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I quite don't get why Dataproc has a NoSuchMethodError when everything runs fine locally.

Someone knows why this is?

Version mismatch with GCP. I had Spark 3.2.1, but the clusters run on 3.1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM