[英]How to save 15k csv files in databricks/ Azure data lake
I've a question how should I download a.csv files from Auzre data lake then make some calculation and save this in.csv again.我有一个问题,我应该如何从 Auzre 数据湖下载 a.csv 文件,然后进行一些计算并将其保存在.csv 中。 I know that for downloading.csv I can use:
data=pd.read_csv('example.csv') #example
我知道下载.csv 我可以使用:
data=pd.read_csv('example.csv') #example
new_data=data//2+data #calculation in databricks notebook
and now the question is how to save new_data
in.csv format in Azure Data lake with the name: example_calulated.csv
new_data=data//2+data #calculation in databricks notebook
现在的问题是如何将new_data
保存为example_calulated.csv
格式的新数据。
To access files from ADLS you need to Mount an Azure Data Lake Storage Gen2 filesystem to DBFS.要从 ADLS 访问文件,您需要将 Azure Data Lake Storage Gen2 文件系统挂载到 DBFS。
To read files from ADLS use the code below.要从 ADLS 读取文件,请使用以下代码。
df = spark.read.format("csv").option("inferSchema", "true").option("header", "true").option("delimiter",",").load(file_location)
After applying transformations on data, you can write data in CSV file.对数据应用转换后,您可以在 CSV 文件中写入数据。 Follow below code.
按照下面的代码。
target_folder_path = 'path_to_adls_folder '
#write as CSV data
df.write.format("CSV").save("example_calulated.csv ")
Then you will have to rename saved csv file using dbutils.fs.mv然后你必须使用 dbutils.fs.mv 重命名保存的 csv 文件
Although it rather copies and deletes the old file.尽管它宁愿复制和删除旧文件。 There is no real rename function for Databricks
Databricks 没有真正的重命名 function
dbutils.fs.mv(old_name, new_name)
For more information you refer this article by Ryan Kennedy有关更多信息,请参阅 Ryan Kennedy 的这篇文章
To rename 15K files you can refer to this similar issue answered by sri sivani charan要重命名 15K 文件,您可以参考 sri sivani charan 回答的类似问题
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.