简体   繁体   English

使用 Delta 格式覆盖 spark 数据帧写入方法中的特定分区

[英]Overwrite specific partitions in spark dataframe write method with Delta format

Able to overwrite specific partition by below setting when using Parquet format, without affecting data in other partition folders使用 Parquet 格式时可以通过以下设置覆盖特定分区,而不影响其他分区文件夹中的数据

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")

data.toDF().write.mode("overwrite").format("parquet").partitionBy("date", "name").save("abfss://path/to/somewhere")

But this does not work with Delta format in Databricks.但这不适用于 Databricks 中的 Delta 格式。 Let me know how to handle this in delta format让我知道如何以增量格式处理此问题

Create mount between storage account and azure Databricks ,it will create delta storage location.在存储帐户和 azure Databricks 之间创建挂载,它将创建增量存储位置。 Please follow below syntax.请遵循以下语法。

dbutils.fs.mount(
    source = "wasbs://<container_name>@<Storage_Account_Name>.blob.core.windows.net/",
    mount_point = "/mnt/<Mount_name>",
    extra_configs = {"fs.azure.account.key.<Storage_Account_Name>.blob.core.windows.net":"<Azure_Storage_Access_key>"})
    

then, attach schema df to write option, depending upon schema mention use partitionBy as such .然后,将架构df附加到写入选项,具体取决于架构提及使用partitionBy and at the end save data in Mount location where you create delta table .最后将数据保存在创建 delta table 的 Mount 位置。

if you want to read delta formate just change .formate("delta")如果您想阅读 delta 甲酸盐,只需更改.formate("delta")

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
df2=df.write.option("header",True).partitionBy("DateID","MedallionID").mode("overwrite").format("parquet").save("/mnt/dem123")

参考1

k2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM