简体   繁体   English

如何将 SQL 查询的结果从 Databricks 导出到 Azure Data Lake Store

[英]How to Export Results of a SQL Query from Databricks to Azure Data Lake Store

I am trying to export the results from a spark.sql query in Databricks to a folder in Azure Data Lake Store - ADLS我正在尝试将 Databricks 中 spark.sql 查询的结果导出到 Azure Data Lake Store 中的文件夹 - ADLS

The tables that I'm querying are also in ADLS.我正在查询的表也在 ADLS 中。

I have accessed the files in ADLS from Databricks with the following commnad:我已经使用以下命令从 Databricks 访问了 ADLS 中的文件:

base = spark.read.csv("adl://carlslake.azuredatalakestore.net/landing/",inferSchema=True,header=True)
base.createOrReplaceTempView('basetable')

I am querying the table with the following command:我正在使用以下命令查询表:

try:
  dataframe = spark.sql("select * from basetable where LOAD_ID = 1199")
except:
  print("Exception occurred 1166")
else:
  print("Table Load_id 1166")

I am then attempting to export the results to the folder in Azure using the following:然后我尝试使用以下命令将结果导出到 Azure 中的文件夹:

try:
 dataframe.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/")
  rename_file("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles", "adl://carlslake.azuredatalakestore.net/landing/RAW", "csv", "Delta_LoyaltyAccount_merged")
except:
  print("Exception Occurred 1166")
else:
  print("Delta File Created")

There are two weird issues here:这里有两个奇怪的问题:

  1. I have specified to query on load_id = 1199, and although there isn't a load_id = 1199 the query is still successful.我已经指定在 load_id = 1199 上查询,虽然没有 load_id = 1199 查询仍然成功。

  2. I would would like the second "try" statement to fail if the first "try" failed, but the second try statement runs regards of the first "try" statement.如果第一个“try”失败,我希望第二个“try”语句失败,但第二个 try 语句运行第一个“try”语句的问候。

Can someone let me know where I'm going wrong?有人可以让我知道我哪里出错了吗?

The table can be viewed here thetable该表可以在这里查看thetable

Just thought I would share with you the answer;只是想我会和你分享答案;

try:
  dataframe = spark.sql("select * from basetable where LOAD_ID = 1166")
except:
  print("Exception occurred 1166")
if dataframe.count() == 0:
  print("No data rows 1166")
else:
  dataframe.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/")
  rename_file("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles", "adl://carlslake.azuredatalakestore.net/landing/RAW", "csv", "Delta_LoyaltyAccount_merged")

I hope it works for you too.我希望它也适用于你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从Azure Data Lake Store中读取Azure Databricks中的JSON文件 - How to read a JSON file in Azure Databricks from Azure Data Lake Store 从数据湖重命名 Azure Databricks 中的文件时出现问题 - Problem when rename file in Azure Databricks from a data lake 使用Databricks将Google Api的结果写入数据湖 - Write the results of the Google Api to a data lake with Databricks 无法使用 python 从 Azure 数据湖中加载 tensorflow keras model - Trouble loading tensorflow keras model in databricks from Azure data lake using python 如何存储SQL查询的结果并使用它? - How to store results from SQL query and use it? 从 Azure Data Lake Storage Gen 2 读取 CSV 到 Pandas Dataframe | 没有数据块 - Read CSV from Azure Data Lake Storage Gen 2 to Pandas Dataframe | NO DATABRICKS 如何在 databricks/Azure 数据湖中保存 15k csv 文件 - How to save 15k csv files in databricks/ Azure data lake 使用 Databricks 在 Apache Spark 中安装 Azure 数据湖时出错 - Error Mounting Azure Data Lake in Apache Spark using Databricks 如何将 for 循环的结果存储在数据框中并导出到 Excel? - How to store results from a for loop in a data frame and export to Excel? 用于访问 Azure Data Lake Store 的 Python 代码 - Python code to access Azure Data Lake Store
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM