将数据帧从 Azure Databricks 笔记本写入 Azure DataLake Gen2 表

Question

I've created a DataFrame which I would like to write / export next to my Azure DataLake Gen2 in Tables (need to create new Table for this).我创建了一个 DataFrame，我想在表中的 Azure DataLake Gen2 旁边写入/导出它（需要为此创建新表）。

In the future I will also need to update this Azure DL Gen2 Table with new DataFrames.将来，我还需要使用新的 DataFrame 更新此 Azure DL Gen2 表。

In Azure Databricks I've created a connection Azure Databricks -> Azure DataLake to see my my files:在 Azure Databricks 中，我创建了一个连接 Azure Databricks -> Azure DataLake 来查看我的文件：

Appreciate help how to write it in spark / pyspark.感谢帮助如何在 spark/pyspark 中编写它。

Thank you!谢谢！

Answer 1

Steps to write dataframe from Azure Databricks Notebook to Azure Data Lake Gen2:将数据帧从 Azure Databricks Notebook 写入 Azure Data Lake Gen2 的步骤：

Step1: Access directly using the storage account access key步骤 1：使用存储帐户访问密钥直接访问

Step2: Using DBUTILS to list the files in the storage account步骤 2：使用 DBUILS 列出存储帐户中的文件

Step3: Use the previosult established DBFS mount point to read the data and create the data frame. Step3：使用之前建立的DBFS挂载点读取数据，创建数据框。

Step4: Write data into Azure Data Lake Gen2 account步骤 4：将数据写入 Azure Data Lake Gen2 帐户

Read the airline csv file and write the output to parquet format for easy query读取航空公司 csv 文件并将输出写入 parquet 格式以便于查询

For more details, refer " Tutorial: Azure Data Lake Storage Gen2, Azure Databricks & Spark ".有关更多详细信息，请参阅“ 教程：Azure Data Lake Storage Gen2、Azure Databricks 和 Spark ”。

Hope this helps.希望这可以帮助。 Do let us know if you any further queries.如果您有任何进一步的疑问，请告诉我们。

Answer 2

我建议不要以 parquet 格式写入数据，而是使用 Delta 格式，它内部使用 Parquet 格式但提供其他功能，如 ACID 事务。语法是

df.write.format("delta").save(path)

将数据帧从 Azure Databricks 笔记本写入 Azure DataLake Gen2 表

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-01-30 08:45:39

解决方案2
1 2020-02-14 03:31:57

将数据帧从 Azure Databricks 笔记本写入 Azure DataLake Gen2 表

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-01-30 08:45:39

解决方案2 1 2020-02-14 03:31:57

解决方案1
2 已采纳 2020-01-30 08:45:39

解决方案2
1 2020-02-14 03:31:57