简体繁体 English

如何编写两个Spark DataFrames以原子方式进行Redshift？

[英]How to write two Spark DataFrames to Redshift atomically?

原文 2018-04-10 23:10:04 0 1 apache-spark/ amazon-redshift/ databricks

I am using Databricks spark-redshift to write DataFrames to Redshift. 我正在使用Databricks spark-redshift将DataFrames写入Redshift。 I have two DataFrames that get appended to two separate tables, but I need this to happen atomically, ie if the second DataFrame fails to write to its table, I'll need the first one to be undone as well. 我有两个DataFrame追加到两个单独的表中，但是我需要原子地执行此操作，即，如果第二个DataFrame无法写入其表，则也需要撤消第一个。 Is there any way to do that? 有什么办法吗？

1 个解决方案

The solution is to have a staging table for each target table. 解决方案是为每个目标表都有一个临时表。 To write Spark results to the database: 要将Spark结果写入数据库：

Clean staging tables ( DELETE FROM staging_table ) 清理登台表（ DELETE FROM staging_table ）
Write the data frames to staging tables using spark-redshift (not atomic) 使用spark-redshift（非原子）将数据帧写入登台表
Atomically copy from staging tables to target tables in a transaction (for Python use redshift-sqlalchemy package). 从临时表以原子方式复制到事务中的目标表（对于Python，请使用redshift-sqlalchemy包）。

Only one instance of the Spark application can be running at a time, ie you can't have two jobs writing to staging tables at the same time, otherwise the resulting data won't be valid. 一次只能运行一个Spark应用程序实例，即您不能同时有两个作业写入临时表，否则结果数据将无效。

如何使用 Apache Spark 并行读写两个 DataFrame - How to read and write two DataFrames in parallel with Apache Spark

如何将Spark DataFrames写入Postgres DB - How to write spark DataFrames to Postgres DB

如何使用Apache Spark数据帧写入和不写入 - How to write IN and NOT IN using Apache Spark Dataframes

如何同时分组/应用两个Spark DataFrame？ - How to simultaneously group/apply two Spark DataFrames?

如何在 Spark 中正确连接两个数据帧 - How to correctly join two dataframes in Spark

如何在 Scala 和 Apache Spark 中加入两个 DataFrame？ - How to join two DataFrames in Scala and Apache Spark?

如何在 Scala (spark) 中比较两个数据框中的列 - How to compare columns in two dataframes in Scala (spark)

如何将查询从 Spark 写入 Redshift？ - How do I write query from Spark to Redshift?

Spark：减去两个数据帧 - Spark: subtract two DataFrames

在 Spark 中加入两个数据帧 - Joining two dataframes in Spark

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Apache Spark 并行读写两个 DataFrame - How to read and write two DataFrames in parallel with Apache Spark 如何将Spark DataFrames写入Postgres DB - How to write spark DataFrames to Postgres DB 如何使用Apache Spark数据帧写入和不写入 - How to write IN and NOT IN using Apache Spark Dataframes 如何同时分组/应用两个Spark DataFrame？ - How to simultaneously group/apply two Spark DataFrames? 如何在 Spark 中正确连接两个数据帧 - How to correctly join two dataframes in Spark 如何在 Scala 和 Apache Spark 中加入两个 DataFrame？ - How to join two DataFrames in Scala and Apache Spark? 如何在 Scala (spark) 中比较两个数据框中的列 - How to compare columns in two dataframes in Scala (spark) 如何将查询从 Spark 写入 Redshift？ - How do I write query from Spark to Redshift? Spark：减去两个数据帧 - Spark: subtract two DataFrames 在 Spark 中加入两个数据帧 - Joining two dataframes in Spark

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM