简体   繁体   English

如何将数据从 Azure Databricks SQL 加载到 GCP Databricks SQL

[英]How to load data from Azure Databricks SQL to GCP Databricks SQL

Is there an easy way to load data from Azure Databricks Spark DB to GCP Databricks Spark DB?有没有一种简单的方法可以将数据从 Azure Databricks Spark DB 加载到 GCP Databricks Spark DB?

  1. Obtain JDBC details from Azure instance and use them in GCP to pull data just as from any other JDBC source .从 Azure 实例获取JDBC 详细信息,并在 GCP 中使用它们来提取数据,就像从任何其他 JDBC 源中提取数据一样。
// This is run in GCP instance
some_table = spark.read
  .format("jdbc")
  .option("url", "jdbc:databricks://adb-xxxx.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/xxxx;AuthMech=3;UID=token;PWD=xxxx")
  .option("dbtable", "some_table")
  .load()
  1. Assuming Azure data is stored in Blob/ADLSv2 storage, mount it in GCP instance's DBFS and read data directly.假设Azure数据存储在Blob/ADLSv2存储中, 将其挂载到GCP实例的DBFS中,直接读取数据。
// This is run in GCP instance
// Assuming ADLSv2 on Azure side
val configs = Map(
  "fs.azure.account.auth.type" -> "OAuth",
  "fs.azure.account.oauth.provider.type" -> "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
  "fs.azure.account.oauth2.client.id" -> "<application-id>",
  "fs.azure.account.oauth2.client.secret" -> dbutils.secrets.get(scope="<scope-name>",key="<service-credential-key-name>"),
  "fs.azure.account.oauth2.client.endpoint" -> "https://login.microsoftonline.com/<directory-id>/oauth2/token")

dbutils.fs.mount(
  source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/",
  mountPoint = "/mnt/<mount-name>",
  extraConfigs = configs)

some_data = spark.read
  .format("delta")
  .load("/mnt/<mount_name>/<some_schema>/<some_table>")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM