繁体   English   中英

Cloud Dataproc 无法访问 Cloud Storage 存储桶

[英]Cloud Dataproc can't access Cloud Storage bucket

我有一个云 dataproc Spark 作业,它也使用 Drvier 端的 Cloud Strage API(从同一文件夹中选择特定文件进行处理)。

以下是 Maven 依赖项:

<dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>2.4.4</version>
            <scope>provided</scope>
        </dependency>
        <dependency>
            <groupId>com.google.cloud</groupId>
            <artifactId>google-cloud-storage</artifactId>
            <version>1.101.0</version>
        </dependency>
    </dependencies>

这是失败的代码的最简单版本:

import com.google.cloud.storage._

object Test {
  def main(args: Array[String]): Unit = {
    val storage = StorageOptions.getDefaultInstance().getService()
--> storage.list("intent_raw")
  }
}

这是堆栈跟踪:

Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.util.concurrent.MoreExecutors.directExecutor()Ljava/util/concurrent/Executor;
    at com.google.api.gax.retrying.BasicRetryingFuture.<init>(BasicRetryingFuture.java:84)
    at com.google.api.gax.retrying.DirectRetryingExecutor.createFuture(DirectRetryingExecutor.java:88)
    at com.google.api.gax.retrying.DirectRetryingExecutor.createFuture(DirectRetryingExecutor.java:74)
    at com.google.cloud.RetryHelper.run(RetryHelper.java:75)
    at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:50)
    at com.google.cloud.storage.StorageImpl.listBlobs(StorageImpl.java:372)
    at com.google.cloud.storage.StorageImpl.list(StorageImpl.java:328)
--> at ai.mandal.cloud.dataproc.Test$.main(Test.scala:14)
    at ai.mandal.cloud.dataproc.Test.main(Test.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我的问题通常是什么会导致它,而且如果我从 dataproc 服务(可以访问存储桶)运行它,我是否需要为此配置单独的凭据。

解决方案是添加

spark.executor.userClassPathFirst = true
spark.driver.userClassPathFirst = true

到工作属性。

该问题是由在google-cloud-storage和主机环境中发现的 guava 版本冲突引起的。

谷歌建议在你的依赖项中隐藏冲突的番石榴,我也试过但在这种情况下不起作用。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM