简体   繁体   English

Spark作业,用于将数据插入Cassandra

[英]Spark Job for Inserting data to Cassandra

I am trying to write data to Cassandra tables using Spark on Scala. 我正在尝试在Scala上使用Spark将数据写入Cassandra表。 Sometimes the spark task fails in between and there are partial writes. 有时,spark任务在这之间会失败,并且会有部分写入。 Does Spark roll back the partial writes when the new task is started from first. 从新启动新任务时,Spark是否会回滚部分写入。

No. Spark (and Cassandra for that matter) doesn't do a commit style insert based on the whole task. 不。Spark(以及Cassandra)不会基于整个任务进行提交样式插入。 This means that your writes must be idempotent otherwise you can end up with strange behaviors. 这意味着您的写操作必须是幂等的,否则您可能会遇到奇怪的行为。

No but if I'm right, you can just reprocess your data. 不可以,但是如果我没错,您可以重新处理数据。 Which will overwrite the partial writes. 它将覆盖部分写入。 When writing to Cassandra, a kind of update (upsert) is used when you are trying to insert data with the same primary key. 写入Cassandra时,当您尝试使用相同的主键插入数据时,将使用一种更新(upsert)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM