Spark中的键值对顺序

Question

When applying a function such as reduceByKey , is there any way to specify a key other than the first element of the tuple? 当应用诸如reduceByKey的功能时，除了元组的第一个元素reduceByKey ，是否有任何其他方法可以指定键？

My current solution consists in using a map function to rearrange the tuple in the correct order by I assume that this additional operation comes at a computational cost, right? 我当前的解决方案包括使用map函数以正确的顺序重新排列元组，因为我假设此附加操作是以计算为代价的，对吗？

Answer 1

To use reduceByKey , you need a key-value RDD[K,V] where K is the key that will be used. 要使用reduceByKey ，您需要一个键值RDD[K,V] ，其中K是将要使用的键。 If you have a RDD[V] you need to perform a map first to specify the key. 如果您具有RDD[V] ，则需要首先执行map以指定密钥。

myRdd.map(x => (x, 1))

If you already have a RDD[K,V] where the key is not what you want... You need another map . 如果您已经拥有RDD[K,V] ，而密钥不是您想要的...您需要另一个map 。 There is no other way to get around this. 没有其他方法可以解决此问题。 For instance, if you want to switch between your key and your value, you could do the following: 例如，如果要在键和值之间切换，可以执行以下操作：

myPairRdd.map(_.swap)

Answer 2

You can override the compare function and call to sortByKey : 您可以覆盖compare函数并调用sortByKey ：

implicit val sortFunction = new Ordering[String] {
  override def compare(a: String, b: String) = // compare function
}

val rddSet: RDD[(String, String)] = sc.parallelize(dataSet)

rddSet.sortByKey()

Spark中的键值对顺序

问题描述

2 个解决方案

解决方案1
3 已采纳 2015-05-26 15:45:34

解决方案2
0 2015-05-26 15:45:00

Spark中的键值对顺序

问题描述

2 个解决方案

解决方案1 3 已采纳 2015-05-26 15:45:34

解决方案2 0 2015-05-26 15:45:00

解决方案1
3 已采纳 2015-05-26 15:45:34

解决方案2
0 2015-05-26 15:45:00