简体   繁体   English

如何在 databricks/Azure 数据湖中保存 15k csv 文件

[英]How to save 15k csv files in databricks/ Azure data lake

I've a question how should I download a.csv files from Auzre data lake then make some calculation and save this in.csv again.我有一个问题,我应该如何从 Auzre 数据湖下载 a.csv 文件,然后进行一些计算并将其保存在.csv 中。 I know that for downloading.csv I can use: data=pd.read_csv('example.csv') #example我知道下载.csv 我可以使用: data=pd.read_csv('example.csv') #example

new_data=data//2+data #calculation in databricks notebook and now the question is how to save new_data in.csv format in Azure Data lake with the name: example_calulated.csv new_data=data//2+data #calculation in databricks notebook现在的问题是如何将new_data保存为example_calulated.csv格式的新数据。

To access files from ADLS you need to Mount an Azure Data Lake Storage Gen2 filesystem to DBFS.要从 ADLS 访问文件,您需要将 Azure Data Lake Storage Gen2 文件系统挂载到 DBFS。

To read files from ADLS use the code below.要从 ADLS 读取文件,请使用以下代码。

df = spark.read.format("csv").option("inferSchema", "true").option("header", "true").option("delimiter",",").load(file_location)

After applying transformations on data, you can write data in CSV file.对数据应用转换后,您可以在 CSV 文件中写入数据。 Follow below code.按照下面的代码。

target_folder_path = 'path_to_adls_folder '

 
#write as CSV data

df.write.format("CSV").save("example_calulated.csv ")

Then you will have to rename saved csv file using dbutils.fs.mv然后你必须使用 dbutils.fs.mv 重命名保存的 csv 文件

Although it rather copies and deletes the old file.尽管它宁愿复制和删除旧文件。 There is no real rename function for Databricks Databricks 没有真正的重命名 function

dbutils.fs.mv(old_name, new_name)

For more information you refer this article by Ryan Kennedy有关更多信息,请参阅 Ryan Kennedy 的这篇文章

To rename 15K files you can refer to this similar issue answered by sri sivani charan要重命名 15K 文件,您可以参考 sri sivani charan 回答的类似问题

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Qubole中如何获取Python将CSV和TXT文件保存到Azure数据湖? - How to get Python in Qubole to save CSV and TXT files to Azure data lake? 如何设计一个具有数据查找(> 15K req / sec)的极其快速的python HTTP API? - How to design an extremely fast python HTTP API with data lookup (>15K req/sec)? Databrics 将 Pandas 数据框保存为 CSV Azure Data Lake - Databrics save pandas dataframe as CSV Azure Data Lake 从 Azure Data Lake Storage Gen 2 读取 CSV 到 Pandas Dataframe | 没有数据块 - Read CSV from Azure Data Lake Storage Gen 2 to Pandas Dataframe | NO DATABRICKS 如何从Azure Data Lake Store中读取Azure Databricks中的JSON文件 - How to read a JSON file in Azure Databricks from Azure Data Lake Store Apache Spark,如何一次将相同的功能应用于15k GraphFrame图形? - Apache Spark, how can I apply the same function to 15k GraphFrame graphs at a time? 为什么我的代码在仅处理 15k 记录后仍然很慢,如何解决这个问题 - Why my code still slow after threading for 15k records only, how to fix this 如何将 SQL 查询的结果从 Databricks 导出到 Azure Data Lake Store - How to Export Results of a SQL Query from Databricks to Azure Data Lake Store 从数据湖重命名 Azure Databricks 中的文件时出现问题 - Problem when rename file in Azure Databricks from a data lake 使用 Databricks 在 Apache Spark 中安装 Azure 数据湖时出错 - Error Mounting Azure Data Lake in Apache Spark using Databricks
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM