如何保持 RDD 的持久性和一致性？

Question

I have the following code (simplification for a complex situation):我有以下代码（复杂情况的简化）：

val newRDD = prevRDD.flatMap{a =>
    Array.fill[Int](scala.util.Random.nextInt(10)){scala.util.Random.nextInt(2)})
}.persist()
val a = newRDD.count
val b = newRDD.count

and even that the RDD supposed to be persisted (and therefore consistent), a and b are not identical in most cases.即使 RDD 应该被持久化（因此是一致的），在大多数情况下a和b并不相同。

Is there a way to keep the results of the first action consistent, so when the second "action" will be called, the results of the first action will be returned?有没有办法让第一个动作的结果保持一致，那么当调用第二个“动作”时，会返回第一个动作的结果？

* Edit * * 编辑 *

The issue that I have is apparently caused by zipWithIndex method exists in my code - which creates indices higher than the count.我遇到的问题显然是由我的代码中存在的zipWithIndex方法引起的 - 它创建的索引高于计数。 I'll ask about it in a different thread.我会在不同的线程中询问它。 Thanks谢谢

Answer 1

There is no way to make sure 100% consistent.没有办法确保 100% 一致。

When you call persist it will try to cache all of partitions on memory if it fits.当您调用persist它会尝试在内存中缓存所有分区（如果合适）。 Otherwise, It will recompute partitions which are not fit on memory.否则，它将重新计算不适合内存的分区。

如何保持 RDD 的持久性和一致性？

问题描述

1 个解决方案

解决方案1
1 2018-08-27 06:55:43

如何保持 RDD 的持久性和一致性？

问题描述

1 个解决方案

解决方案1 1 2018-08-27 06:55:43

解决方案1
1 2018-08-27 06:55:43