如何在 databricks/Azure 数据湖中保存 15k csv 文件

Question

I've a question how should I download a.csv files from Auzre data lake then make some calculation and save this in.csv again.我有一个问题，我应该如何从 Auzre 数据湖下载 a.csv 文件，然后进行一些计算并将其保存在.csv 中。 I know that for downloading.csv I can use: data=pd.read_csv('example.csv') #example我知道下载.csv 我可以使用： data=pd.read_csv('example.csv') #example

new_data=data//2+data #calculation in databricks notebook and now the question is how to save new_data in.csv format in Azure Data lake with the name: example_calulated.csv new_data=data//2+data #calculation in databricks notebook现在的问题是如何将new_data保存为example_calulated.csv格式的新数据。

Answer 1

To access files from ADLS you need to Mount an Azure Data Lake Storage Gen2 filesystem to DBFS.要从 ADLS 访问文件，您需要将 Azure Data Lake Storage Gen2 文件系统挂载到 DBFS。

To read files from ADLS use the code below.要从 ADLS 读取文件，请使用以下代码。

df = spark.read.format("csv").option("inferSchema", "true").option("header", "true").option("delimiter",",").load(file_location)

After applying transformations on data, you can write data in CSV file.对数据应用转换后，您可以在 CSV 文件中写入数据。 Follow below code.按照下面的代码。

target_folder_path = 'path_to_adls_folder '

 
#write as CSV data

df.write.format("CSV").save("example_calulated.csv ")

Then you will have to rename saved csv file using dbutils.fs.mv然后你必须使用 dbutils.fs.mv 重命名保存的 csv 文件

Although it rather copies and deletes the old file.尽管它宁愿复制和删除旧文件。 There is no real rename function for Databricks Databricks 没有真正的重命名 function

dbutils.fs.mv(old_name, new_name)

For more information you refer this article by Ryan Kennedy有关更多信息，请参阅 Ryan Kennedy 的这篇文章

To rename 15K files you can refer to this similar issue answered by sri sivani charan要重命名 15K 文件，您可以参考 sri sivani charan 回答的类似问题

如何在 databricks/Azure 数据湖中保存 15k csv 文件

问题描述

1 个解决方案

解决方案1
0 2022-08-21 04:43:25

如何在 databricks/Azure 数据湖中保存 15k csv 文件

问题描述

1 个解决方案

解决方案1 0 2022-08-21 04:43:25

解决方案1
0 2022-08-21 04:43:25