如何将 SQL 查询的结果从 Databricks 导出到 Azure Data Lake Store

Question

I am trying to export the results from a spark.sql query in Databricks to a folder in Azure Data Lake Store - ADLS我正在尝试将 Databricks 中 spark.sql 查询的结果导出到 Azure Data Lake Store 中的文件夹 - ADLS

The tables that I'm querying are also in ADLS.我正在查询的表也在 ADLS 中。

I have accessed the files in ADLS from Databricks with the following commnad:我已经使用以下命令从 Databricks 访问了 ADLS 中的文件：

base = spark.read.csv("adl://carlslake.azuredatalakestore.net/landing/",inferSchema=True,header=True)
base.createOrReplaceTempView('basetable')

I am querying the table with the following command:我正在使用以下命令查询表：

try:
  dataframe = spark.sql("select * from basetable where LOAD_ID = 1199")
except:
  print("Exception occurred 1166")
else:
  print("Table Load_id 1166")

I am then attempting to export the results to the folder in Azure using the following:然后我尝试使用以下命令将结果导出到 Azure 中的文件夹：

try:
 dataframe.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/")
  rename_file("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles", "adl://carlslake.azuredatalakestore.net/landing/RAW", "csv", "Delta_LoyaltyAccount_merged")
except:
  print("Exception Occurred 1166")
else:
  print("Delta File Created")

There are two weird issues here:这里有两个奇怪的问题：

I have specified to query on load_id = 1199, and although there isn't a load_id = 1199 the query is still successful.我已经指定在 load_id = 1199 上查询，虽然没有 load_id = 1199 查询仍然成功。
I would would like the second "try" statement to fail if the first "try" failed, but the second try statement runs regards of the first "try" statement.如果第一个“try”失败，我希望第二个“try”语句失败，但第二个 try 语句运行第一个“try”语句的问候。

Can someone let me know where I'm going wrong?有人可以让我知道我哪里出错了吗？

The table can be viewed here thetable该表可以在这里查看thetable

Answer 1

Just thought I would share with you the answer;只是想我会和你分享答案；

try:
  dataframe = spark.sql("select * from basetable where LOAD_ID = 1166")
except:
  print("Exception occurred 1166")
if dataframe.count() == 0:
  print("No data rows 1166")
else:
  dataframe.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/")
  rename_file("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles", "adl://carlslake.azuredatalakestore.net/landing/RAW", "csv", "Delta_LoyaltyAccount_merged")

I hope it works for you too.我希望它也适用于你。

如何将 SQL 查询的结果从 Databricks 导出到 Azure Data Lake Store

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-01-05 21:37:50

如何将 SQL 查询的结果从 Databricks 导出到 Azure Data Lake Store

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-01-05 21:37:50

解决方案1
1 已采纳 2019-01-05 21:37:50