I'm trying to read an object with a spark job locally. I previously created with another Spark job locally. When looking at the logs I see nothing weird, and in the spark UI the job is just stuck
Before I kick the read job I update the spark config as follows:
val hc = spark.sparkContext.hadoopConfiguration
hc.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem")
hc.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS")
hc.set("fs.gs.project.id", credential.projectId)
hc.set("fs.gs.auth.service.account.enable", "true")
hc.set("fs.gs.auth.service.account.email", credential.email)
hc.set("fs.gs.auth.service.account.private.key.id", credential.keyId)
hc.set("fs.gs.auth.service.account.private.key", credential.key)
Then I simply read like this
val path = "gs://mybucket/data.csv"
val options = Map("credentials" -> credential.base64ServiceAccount, "parentProject" -> credential.projectId)
spark.read.format("csv")
.options(options)
.load(path)
My service account has those permissions, I literally added all permissions I could find for Object storage
Storage Admin
Storage Object Admin
Storage Object Creator
Storage Object Viewer
This is how I previously wrote the object
val path = "gs://mybucket/data.csv"
val options = Map("credentials" -> credential.base64ServiceAccount, "parentProject" -> credential.projectId, "header" -> "true")
var writer = df.write.format("csv").options(options)
writer.save(path)
Those are my dependencies
Seq(
"org.apache.spark" %% "spark-core" % "3.1.1",
"org.apache.hadoop" % "hadoop-client" % "3.3.1",
"com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.23.0",
"com.google.cloud.bigdataoss" % "gcs-connector" % "hadoop3-2.2.4",
"com.google.cloud" % "google-cloud-storage" % "2.2.1"
)
Any idea why would the write succeed but the read stuck like this?
I was using a version of the dependencies that was not the latest. Once I've updated google connector dependencies to the latest version (December 2021) I got the read working as well as the write from Google Storage.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.