简体   繁体   English

交易区块 | Spark SQL, rdd

[英]Transactional block | Spark SQL, rdd

I have a processing block inside my dStream.foreachRDD method and that processing includes persisting to mysql using spark sql.我的 dStream.foreachRDD 方法中有一个处理块,该处理包括使用 spark sql 持久化到 mysql。 Post that, I am persisting the latest processed offset in another schema/table.发布后,我将最新处理的偏移量保存在另一个模式/表中。 I want to make the entire block transactional(scala).我想让整个块都具有事务性(scala)。 How to achieve that?如何做到这一点? Following are the relevant excerpts from the code :以下是代码的相关摘录:

foreachRDD(rdd => {
  ...........
  ...................................


  df.write.mode("append") .jdbc(url + rawstore_schema +"?rewriteBatchedStatements=true",tablesToFetch(index),connectionProperties)

  ....................
  metricsStatement.executeUpdate("Insert into metrics.txn_offsets (topic,part,off,date_updated) values (...........................

}

as both the write operations(processed data and offset data) are done on two different database/connections, how to make them transactional?由于写入操作(处理数据和偏移数据)都是在两个不同的数据库/连接上完成的,如何使它们具有事务性?

Thanks谢谢

I had the same question.我有同样的问题。 Looking through the Spark code (up to v2.1) it doesn't seem to be possible, there's no option to specify transaction management.查看 Spark 代码(最高 v2.1)似乎不可能,没有指定事务管理的选项。

More details in my other answer here: https://stackoverflow.com/a/42964361/47551我的其他答案中的更多详细信息: https : //stackoverflow.com/a/42964361/47551

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM