简体   繁体   English

将 csv 文件写入 azure blob 存储

[英]Write a csv file into azure blob storage

I am trying use pyspark to analyze my data on databricks notebooks.我正在尝试使用 pyspark 来分析我在 databricks 笔记本上的数据。 Blob storage has been mounted on the databricks cluster and after ananlyzing, would like to write csv back into blob storage. Blob 存储已挂载在 databricks 集群上,经过分析,想将 csv 写回 Blob 存储。 As pyspark working in distributed fashion, csv file is broken into small blocks and written on the blob storage.由于 pyspark 以分布式方式工作,csv 文件被分成小块并写入 blob 存储。 How to overcome this and write as a single csv file on blob when we do analysis using pyspark.当我们使用 pyspark 进行分析时,如何克服这个问题并将其作为单个 csv 文件写入 blob。 Thanks.谢谢。

Also please let me know, whether this can be overcome if we move to Azure datalake storage Gen2? 还请让我知道,如果我们迁移到Azure Datalake存储Gen2,是否可以克服? More optimized and csv can be written as one single file? 更优化了,csv可以作为一个文件写入吗? As I mentioned earlier, analytics is done on databricks notebook with pyspark. 正如我之前提到的,分析是通过pyspark在databricks笔记本上完成的。 Thanks. 谢谢。

Do you really want a single file? 您真的要一个文件吗? If yes, the only way you can overcome it by merging all the small csv files into a single csv file. 如果是,则可以通过将所有小的csv文件合并为单个csv文件来克服此问题的唯一方法。 You can make use of map function on the databricks cluster to merge it or may be you can use some background job to do the same. 您可以使用databricks群集上的map函数将其合并,或者可以使用某些后台作业来完成此操作。

Have a look here: https://forums.databricks.com/questions/14851/how-to-concat-lots-of-1mb-cvs-files-in-pyspark.html 在这里看看: https : //forums.databricks.com/questions/14851/how-to-concat-lots-of-1mb-cvs-files-in-pyspark.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将CSV文件上传到Azure BLOB存储 - Upload csv file to Azure BLOB Storage 将 Azure Blob 存储 csv 文件作为附件发送给 Microsoft Azure 上的用户 - Sending Azure Blob Storage csv file as an attachment to user on Microsoft Azure 如何使用 azure ZC1C425Z574E68385D1CAB1 编辑 azure blob 存储中的 *.csv 文件 - How to edit a *.csv file in the azure blob storage with an azure function? 写入 Azure 存储 Blob (PowerShell) - Write to Azure Storage Blob (PowerShell) Azure 存储 Blob:上传的 CSV 文件显示零字节 - Azure Storage Blob: Uploaded CSV file shows zero bytes 使用pyspark从Azure Blob存储读取(txt,csv)文件 - Reading (txt , csv) FIle from Azure blob storage using pyspark 使用 Python Azure 函数处理存储在 blob 存储中的 csv 文件 - Process csv file stored in blob storage using Python Azure functions 使用 python 创建 csv 文件并将其上传到 azure blob 存储 - Create and upload csv file to azure blob storage using python 如何检查上传到 azure blob 存储中的 csv 文件中的记录计数? - How to check record count in a csv file uploaded in azure blob storage? 如何在 Azure(.Net) 中导入 csv 文件(已上传到 blob 存储中) - How to import csv file( already uploaded in blob storage) in Azure(.Net)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM