繁体   English   中英

使用Google Storage Bucket输入运行Spark Job时找不到文件

[英]File not found when running Spark Job with input from Google Storage Bucket

我正在Google Cloud Dataproc集群上运行一个工作,该集群采用一个参数 - 输入文件的路径。 此文件存储在Google云端存储分区中。 我得到一个FileNotFoundException(下面的跟踪)。 那为什么会这样?

gcloud dataproc jobs submit spark --cluster cluster-1 --class MST.ComputeMST \
    --jars gs://dataproc-211700eb-83ed-456d-a67e-98af9e6fa02d-us/ComputeMST.jar \
    -- gs:///dataproc-211700eb-83ed-456d-a67e-98af9e6fa02d-us/input.txt

Job [8b193fcd-1350-462b-ae11-373333e868fe] submitted.
Waiting for job output...
17/05/16 05:06:02 INFO com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystemBase: GHFS version: 1.6.1-hadoop2
number of runs = 0
Exception in thread "main" java.io.FileNotFoundException: gs:/dataproc-211700eb-83ed-456d-a67e-98af9e6fa02d-us/input.txt (No such file or directory)
  at java.io.FileInputStream.open0(Native Method)
  at java.io.FileInputStream.open(FileInputStream.java:195)
  at java.io.FileInputStream.<init>(FileInputStream.java:138)
  at java.io.FileInputStream.<init>(FileInputStream.java:93)
  at java.io.FileReader.<init>(FileReader.java:58)
  at MST.ComputeMST.main(ComputeMST.java:670)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:736)
  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
ERROR: (gcloud.dataproc.jobs.submit.spark) Job [8b193fcd-1350-462b-ae11-373333e868fe] entered state [ERROR] while waiting for [DONE].

即使默认情况下在Cloud Dataproc群集上安装了GCS连接器,也无法通过java.io.FileReader接口在作业中使用它。

要通过GCS连接器访问GCS对象,您需要使用 Hadoop的FileSystem接口。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM