简体   繁体   中英

In PySpark, is there a way to pass credentials as variables into spark.read?

Spark allows us to read directly from Google BigQuery, as shown below:

df = spark.read.format("bigquery") \
  .option("credentialsFile", "googleKey.json") \
  .option("parentProject", "projectId") \
  .option("table", "project.table") \
  .load()

However having the key saved on the virtual machine, isn't a great idea. I have the Google key saved as JSON securely in a credential management tool. The key is read on-demand and saved into a variable called googleKey.

Is it possible to pass JSON into speak.read, or pass in the credentials as a Dictionary?

The other option is credentials . From spark-bigquery-connector docs :

How do I authenticate outside GCE / Dataproc?

Credentials can also be provided explicitly, either as a parameter or from Spark runtime configuration. They should be passed in as a base64-encoded string directly.

 // Globally spark.conf.set("credentials", "<SERVICE_ACCOUNT_JSON_IN_BASE64>") // Per read/Write spark.read.format("bigquery").option("credentials", "<SERVICE_ACCOUNT_JSON_IN_BASE64>")

This is more like chicken and egg situation. if you are storing credential file in secret manager (hope that's not your credential manager tool). How would you access secret manager. For that you might need key and where would you store that key.

For this, Azure has created a managed identities, through which two different services can talk to each other without providing any keys (credential) explicitly.

If you are running from Dataproc, then the node has a built in service account which you can control on cluster creation. In this case you do not need to pass any credentials/credentialsFile option.

If you are running on another cloud or on prem, you can use the local secret manager, or implement the connector's AccessTokenProvider which lets you full customization of the credentials creation.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM