[英]How to keep RDD persisted and consistent?
I have the following code (simplification for a complex situation):我有以下代码(复杂情况的简化):
val newRDD = prevRDD.flatMap{a =>
Array.fill[Int](scala.util.Random.nextInt(10)){scala.util.Random.nextInt(2)})
}.persist()
val a = newRDD.count
val b = newRDD.count
and even that the RDD supposed to be persisted (and therefore consistent), a
and b
are not identical in most cases.即使 RDD 应该被持久化(因此是一致的),在大多数情况下
a
和b
并不相同。
Is there a way to keep the results of the first action consistent, so when the second "action" will be called, the results of the first action will be returned?有没有办法让第一个动作的结果保持一致,那么当调用第二个“动作”时,会返回第一个动作的结果?
* Edit * * 编辑 *
The issue that I have is apparently caused by zipWithIndex
method exists in my code - which creates indices higher than the count.我遇到的问题显然是由我的代码中存在的
zipWithIndex
方法引起的 - 它创建的索引高于计数。 I'll ask about it in a different thread.我会在不同的线程中询问它。 Thanks
谢谢
There is no way to make sure 100% consistent.没有办法确保 100% 一致。
When you call persist
it will try to cache all of partitions on memory if it fits.当您调用
persist
它会尝试在内存中缓存所有分区(如果合适)。 Otherwise, It will recompute partitions which are not fit on memory.否则,它将重新计算不适合内存的分区。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.