使用 Delta 格式覆盖 spark 数据帧写入方法中的特定分区

Question

Able to overwrite specific partition by below setting when using Parquet format, without affecting data in other partition folders使用 Parquet 格式时可以通过以下设置覆盖特定分区，而不影响其他分区文件夹中的数据

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")

data.toDF().write.mode("overwrite").format("parquet").partitionBy("date", "name").save("abfss://path/to/somewhere")

But this does not work with Delta format in Databricks.但这不适用于 Databricks 中的 Delta 格式。 Let me know how to handle this in delta format让我知道如何以增量格式处理此问题

Answer 1

Create mount between storage account and azure Databricks ,it will create delta storage location.在存储帐户和 azure Databricks 之间创建挂载，它将创建增量存储位置。 Please follow below syntax.请遵循以下语法。

dbutils.fs.mount(
    source = "wasbs://<container_name>@<Storage_Account_Name>.blob.core.windows.net/",
    mount_point = "/mnt/<Mount_name>",
    extra_configs = {"fs.azure.account.key.<Storage_Account_Name>.blob.core.windows.net":"<Azure_Storage_Access_key>"})

then, attach schema df to write option, depending upon schema mention use partitionBy as such .然后，将架构df附加到写入选项，具体取决于架构提及使用partitionBy 。 and at the end save data in Mount location where you create delta table .最后将数据保存在创建 delta table 的 Mount 位置。

if you want to read delta formate just change .formate("delta")如果您想阅读 delta 甲酸盐，只需更改.formate("delta")

spark.conf.set("spark.sql.sources.partitionOverwriteMode","dynamic")
df2=df.write.option("header",True).partitionBy("DateID","MedallionID").mode("overwrite").format("parquet").save("/mnt/dem123")

使用 Delta 格式覆盖 spark 数据帧写入方法中的特定分区

问题描述

1 个解决方案

解决方案1
0 2022-07-18 08:07:53

使用 Delta 格式覆盖 spark 数据帧写入方法中的特定分区

问题描述

1 个解决方案

解决方案1 0 2022-07-18 08:07:53

解决方案1
0 2022-07-18 08:07:53