[英]How to Export Results of a SQL Query from Databricks to Azure Data Lake Store
I am trying to export the results from a spark.sql query in Databricks to a folder in Azure Data Lake Store - ADLS我正在尝试将 Databricks 中 spark.sql 查询的结果导出到 Azure Data Lake Store 中的文件夹 - ADLS
The tables that I'm querying are also in ADLS.我正在查询的表也在 ADLS 中。
I have accessed the files in ADLS from Databricks with the following commnad:我已经使用以下命令从 Databricks 访问了 ADLS 中的文件:
base = spark.read.csv("adl://carlslake.azuredatalakestore.net/landing/",inferSchema=True,header=True)
base.createOrReplaceTempView('basetable')
I am querying the table with the following command:我正在使用以下命令查询表:
try:
dataframe = spark.sql("select * from basetable where LOAD_ID = 1199")
except:
print("Exception occurred 1166")
else:
print("Table Load_id 1166")
I am then attempting to export the results to the folder in Azure using the following:然后我尝试使用以下命令将结果导出到 Azure 中的文件夹:
try:
dataframe.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/")
rename_file("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles", "adl://carlslake.azuredatalakestore.net/landing/RAW", "csv", "Delta_LoyaltyAccount_merged")
except:
print("Exception Occurred 1166")
else:
print("Delta File Created")
There are two weird issues here:这里有两个奇怪的问题:
I have specified to query on load_id = 1199, and although there isn't a load_id = 1199 the query is still successful.我已经指定在 load_id = 1199 上查询,虽然没有 load_id = 1199 查询仍然成功。
I would would like the second "try" statement to fail if the first "try" failed, but the second try statement runs regards of the first "try" statement.如果第一个“try”失败,我希望第二个“try”语句失败,但第二个 try 语句运行第一个“try”语句的问候。
Can someone let me know where I'm going wrong?有人可以让我知道我哪里出错了吗?
Just thought I would share with you the answer;只是想我会和你分享答案;
try:
dataframe = spark.sql("select * from basetable where LOAD_ID = 1166")
except:
print("Exception occurred 1166")
if dataframe.count() == 0:
print("No data rows 1166")
else:
dataframe.coalesce(1).write.option("header","true").mode("overwrite").csv("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles/")
rename_file("adl://carlslake.azuredatalakestore.net/jfolder2/outputfiles", "adl://carlslake.azuredatalakestore.net/landing/RAW", "csv", "Delta_LoyaltyAccount_merged")
I hope it works for you too.我希望它也适用于你。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.