简体   繁体   English

如何在Spark中使用多个键构建查找功能

[英]How to build a look up function with multiple keys in spark

I am new to spark , and asked a similar question last week. 我刚接触火花 ,上周问了类似的问题。 It compiled but not working. 它已编译但不起作用。 So I really don't know what to do. 所以我真的不知道该怎么办。 Here is my problem: I have table A containing 3 columns, like this 这是我的问题:我的表A包含3列,像这样

-----------
A1  A1  A3
-----------
a    b   c

and Another Table B like this 另一个像这样的表B

------------------------------------
B1  B2  B3  B4  B5  B6  B7  B8  B9
------------------------------------
1   a   3   4   5   b   7   8    c

My logic is: A1 A2 A3 are my key, and it correspond to B2 B6 B9 in table B. I need to build a look up function that takes A1 A2 A3 as key and returns me B8. 我的逻辑是:A1 A2 A3是我的密钥,它对应于表B中的B2 B6 B9。我需要构建一个以A1 A2 A3作为密钥并返回我B8的查找函数。

This is what I tried last week: 这是我上周尝试过的:

//getting the data in to dataframe
val clsrowRDD = clsfile.map(_.split("\t")).map(p => Row(p(0),p(1),p(2),p(3),p(4),p(5),p(6),p(7),p(8)))
val clsDataFrame = sqlContext.createDataFrame(clsrowRDD, clsschema)

//mapping the three key with the value
val smallRdd = clsDataFrame.rdd.map{row: Row => (mutable.WrappedArray.make[String](Array(row.getString(1), row.getString(5), row.getString(8))), row.getString(7))}

val lookupMap:Map[mutable.WrappedArray[String], String] = smallRdd.collectAsMap()

//build the look up function
def lookup(lookupMap: Map[mutable.WrappedArray[String],String]) =
udf((input: mutable.WrappedArray[String]) => lookupMap.lift(input))

//call the function
val combinedDF  = mstrDataFrame.withColumn("ENTP_CLS_CD",lookup(lookupMap)($"SRC_SYS_CD",$"ORG_ID",$"ORG_CD"))

And this code compiles, but doesn't really return me the results I need. 这段代码可以编译,但是并没有真正返回我所需的结果。 I am thinking it's because I am passing in an array as the key and I don't really have array inside my table. 我在想这是因为我传入了一个数组作为键,而我的表中却没有数组。 But when I tried change the map type as Map[(String,String,String),String] , I don't know how you pass it in the function. 但是,当我尝试将地图类型更改为Map[(String,String,String),String] ,我不知道如何在函数中传递它。

Tons of thanks. 谢谢。

If you are trying to get B8 value for every matching of A1 with B2 and A2 with B6 and A3 with B9 , then simple join and select methods should do the trick. 如果您试图为A1B2以及A2B6以及A3B9每次匹配获取B8值,那么简单的joinselect方法就可以解决问题。 Creating a lookup map would create complexity. 创建查找图会带来复杂性。

As you explained you have to dataframes df1 and df2 as 正如您所解释的,您必须将数据帧df1df2

+---+---+---+
|A1 |A2 |A3 |
+---+---+---+
|a  |b  |c  |
+---+---+---+

+---+---+---+---+---+---+---+---+---+
|B1 |B2 |B3 |B4 |B5 |B6 |B7 |B8 |B9 |
+---+---+---+---+---+---+---+---+---+
|1  |a  |3  |4  |5  |b  |7  |8  |c  |
|1  |a  |3  |4  |5  |b  |7  |8  |e  |
+---+---+---+---+---+---+---+---+---+

Simple join and select can be done 简单的joinselect即可完成

df1.join(df2, $"A1" === $"B2" && $"A2" === $"B6" && $"A3" === $"B9", "inner").select("B8")

which should give you 这应该给你

+---+
|B8 |
+---+
|8  |
+---+

I hope the answer is helpful 我希望答案是有帮助的

Updated 更新

According to what I understood from your question and comments below, you are confused on how to pass array to your lookup udf function. 根据我从下面的问题和评论中了解的内容,您对如何将array传递给lookup udf函数感到困惑。 For that you can use array function. 为此,您可以使用数组函数。 I have modified some parts of your almost perfect code to make it work 我已经修改了几乎完美的代码中的某些部分以使其正常工作

//mapping the three key with the value
val smallRdd = clsDataFrame.rdd
  .map{row: Row => (mutable.WrappedArray.make[String](Array(row.getString(1), row.getString(5), row.getString(8))), row.getString(7))}

val lookupMap: collection.Map[mutable.WrappedArray[String], String] = smallRdd.collectAsMap()

//build the look up function
def lookup(lookupMap: collection.Map[mutable.WrappedArray[String],String]) =
udf((input: mutable.WrappedArray[String]) => lookupMap.lift(input))

//call the function
val combinedDF  = mstrDataFrame.withColumn("ENTP_CLS_CD",lookup(lookupMap)(array($"SRC_SYS_CD",$"ORG_ID",$"ORG_CD")))

You should have 你应该有

+----------+------+------+-----------+
|SRC_SYS_CD|ORG_ID|ORG_CD|ENTP_CLS_CD|
+----------+------+------+-----------+
|a         |b     |c     |8          |
+----------+------+------+-----------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM