[英]Key-value pair order in Spark
When applying a function such as reduceByKey
, is there any way to specify a key other than the first element of the tuple? 当应用诸如
reduceByKey
的功能时,除了元组的第一个元素reduceByKey
,是否有任何其他方法可以指定键?
My current solution consists in using a map
function to rearrange the tuple in the correct order by I assume that this additional operation comes at a computational cost, right? 我当前的解决方案包括使用
map
函数以正确的顺序重新排列元组,因为我假设此附加操作是以计算为代价的,对吗?
To use reduceByKey
, you need a key-value RDD[K,V]
where K
is the key that will be used. 要使用
reduceByKey
,您需要一个键值RDD[K,V]
,其中K
是将要使用的键。 If you have a RDD[V]
you need to perform a map
first to specify the key. 如果您具有
RDD[V]
,则需要首先执行map
以指定密钥。
myRdd.map(x => (x, 1))
If you already have a RDD[K,V]
where the key is not what you want... You need another map
. 如果您已经拥有
RDD[K,V]
,而密钥不是您想要的...您需要另一个map
。 There is no other way to get around this. 没有其他方法可以解决此问题。 For instance, if you want to switch between your key and your value, you could do the following:
例如,如果要在键和值之间切换,可以执行以下操作:
myPairRdd.map(_.swap)
You can override the compare function and call to sortByKey
: 您可以覆盖compare函数并调用
sortByKey
:
implicit val sortFunction = new Ordering[String] {
override def compare(a: String, b: String) = // compare function
}
val rddSet: RDD[(String, String)] = sc.parallelize(dataSet)
rddSet.sortByKey()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.