使用Scala Apache Spark合并RDD

Question

I have 2 RDDs. 我有2个RDD。

RDD1: ((String, String), Int)
RDD2: (String, Int)

For example: 例如：

    RDD1

    ((A, X), 1)
    ((B, X), 2)
    ((A, Y), 2)
    ((C, Y), 3)

    RDD2

    (A, 6)
    (B, 7)
    (C, 8)

Output Expected

    ((A, X), 6)
    ((B, X), 14)
    ((A, Y), 12)
    ((C, Y), 24)

In RDD1, (String, String) combination is unique and in RDD2, every string key is unique. 在RDD1中，（String，String）组合是唯一的，在RDD2中，每个字符串键都是唯一的。 The score of A from RDD2 (6) gets multiplied with all the score values of entries that have A in its key in RDD1. RDD2（6）中的A得分乘以RDD1中其键中具有A的条目的所有得分值。

14 = 7 * 2
12 = 6 * 2
24 = 8 * 3

I wrote the following but gives me an error on case: 我写了以下内容，但在案例中给出了一个错误：

val finalRdd = countRdd.join(countfileRdd).map(case (k, (ls, rs)) => (k, (ls * rs)))

Can someone help me out on this ? 有人可以帮我解决这个问题吗？

Answer 1

Your first RDD doesn't have the same key type as the second RDD(tuple (A, X) versus A). 您的第一个RDD与第二个RDD（元组（A，X）与A）的密钥类型不同。 You should transform it before joining: 您应该在加入前对其进行转换：

val rdd1  = sc.parallelize(List((("A", "X"), 1), (("A", "Y"), 2)))
val rdd2 = sc.parallelize(List(("A", 6)))
val rdd1Transformed = rdd1.map { 
   case ((letter, coord), value) => (letter, (coord, value)) 
}
val result = rdd1Transformed
  .join(rdd2)
  .map { 
    case (letter, ((coord, v1), v2)) => ((letter, coord), v1 * v2) 
  }
result.collect()
res1: Array[((String, String), Int)] = Array(((A,X),6), ((A,Y),12))

使用Scala Apache Spark合并RDD

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-04-26 16:57:45

使用Scala Apache Spark合并RDD

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-04-26 16:57:45

解决方案1
3 已采纳 2015-04-26 16:57:45