简体   繁体   English

Spark中的键值对顺序

[英]Key-value pair order in Spark

When applying a function such as reduceByKey , is there any way to specify a key other than the first element of the tuple? 当应用诸如reduceByKey的功能时,除了元组的第一个元素reduceByKey ,是否有任何其他方法可以指定键?

My current solution consists in using a map function to rearrange the tuple in the correct order by I assume that this additional operation comes at a computational cost, right? 我当前的解决方案包括使用map函数以正确的顺序重新排列元组,因为我假设此附加操作是以计算为代价的,对吗?

To use reduceByKey , you need a key-value RDD[K,V] where K is the key that will be used. 要使用reduceByKey ,您需要一个键值RDD[K,V] ,其中K是将要使用的键。 If you have a RDD[V] you need to perform a map first to specify the key. 如果您具有RDD[V] ,则需要首先执行map以指定密钥。

myRdd.map(x => (x, 1))

If you already have a RDD[K,V] where the key is not what you want... You need another map . 如果您已经拥有RDD[K,V] ,而密钥不是您想要的...您需要另一个map There is no other way to get around this. 没有其他方法可以解决此问题。 For instance, if you want to switch between your key and your value, you could do the following: 例如,如果要在键和值之间切换,可以执行以下操作:

myPairRdd.map(_.swap)

You can override the compare function and call to sortByKey : 您可以覆盖compare函数并调用sortByKey

implicit val sortFunction = new Ordering[String] {
  override def compare(a: String, b: String) = // compare function
}

val rddSet: RDD[(String, String)] = sc.parallelize(dataSet)

rddSet.sortByKey()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM