简体   繁体   English

使用pyspark将多个csv文件合并到Azure Blob存储中的一个csv文件

[英]Merge multiple csv files to one csv file in Azure Blob Storage using pyspark

I am using below code to save the csv files back to blob storage, though it is creating multiple files as it runs in loop. 我正在使用以下代码将csv文件保存回blob存储,尽管它在循环运行时会创建多个文件。 Now I would like to merge them into one single csv file. 现在,我想将它们合并到一个单独的csv文件中。 Though I have used dbutils.fs.cp/mv, it is not helpful 尽管我使用了dbutils.fs.cp / mv,但它没有帮助

while start_date <= end_date:
df = spark.read.format("com.databricks.spark.csv").options(header="true", inferschema="true").load(inputFilePath)
df.coalesce(1).write.mode("append").option("header","true").format("com.databricks.s`park.csv").save(TargetPath)`

A similar request has been posted below, but it has been done using pandas data frame and I am looking something with spark dataframe. 下面发布了类似的请求,但已使用pandas数据框完成了,而我正在寻找spark数据框。 " Copy data from multiple csv files into one csv file " 将数据从多个csv文件复制到一个csv文件中

My suggestion would be, use while loop to create a list of csv files to read and then use spark csv readers to read them all at once. 我的建议是,使用while循环创建要读取的csv文件列表,然后使用spark csv读取器一次读取所有文件。 For example: 例如:

files = []
while start_date <= end_date:
    files.append(inputFilePath)


df = spark.read.format("com.databricks.spark.csv").options(header="true", inferschema="true").csv(files)

df.coalesce(1).write.mode("append").option("header","true").format("com.databricks.spark.csv").save(TargetPath)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用pyspark从Azure Blob存储读取(txt,csv)文件 - Reading (txt , csv) FIle from Azure blob storage using pyspark 使用 Databricks PySpark 从 Azure blob 存储读取多个 CSV 文件 - Reading multiple CSV files from Azure blob storage using Databricks PySpark 使用 Python 从 azure blob 存储下载文件(csv、excel) - Download files (csv, excel) from azure blob storage using Python 使用 Python Azure 函数处理存储在 blob 存储中的 csv 文件 - Process csv file stored in blob storage using Python Azure functions 使用 python 创建 csv 文件并将其上传到 azure blob 存储 - Create and upload csv file to azure blob storage using python 将 csv 文件写入 azure blob 存储 - Write a csv file into azure blob storage 将CSV文件上传到Azure BLOB存储 - Upload csv file to Azure BLOB Storage Databricks上的PySpark:读取从Azure Blob存储复制的CSV文件会导致java.io.FileNotFoundException - PySpark on Databricks: Reading a CSV file copied from the Azure Blob Storage results in java.io.FileNotFoundException 如何将文件夹中的多个 CSV 文件合并为 Azure 上的单个文件? - How to merge multiple CSV files in a folder to a single file on Azure? 将 Azure Blob 存储 csv 文件作为附件发送给 Microsoft Azure 上的用户 - Sending Azure Blob Storage csv file as an attachment to user on Microsoft Azure
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM