[英]How to build a look up function with multiple keys in spark
I am new to spark , and asked a similar question last week. 我刚接触火花 ,上周问了类似的问题。 It compiled but not working.
它已编译但不起作用。 So I really don't know what to do.
所以我真的不知道该怎么办。 Here is my problem: I have table A containing 3 columns, like this
这是我的问题:我的表A包含3列,像这样
-----------
A1 A1 A3
-----------
a b c
and Another Table B like this 另一个像这样的表B
------------------------------------
B1 B2 B3 B4 B5 B6 B7 B8 B9
------------------------------------
1 a 3 4 5 b 7 8 c
My logic is: A1 A2 A3 are my key, and it correspond to B2 B6 B9 in table B. I need to build a look up function that takes A1 A2 A3 as key and returns me B8. 我的逻辑是:A1 A2 A3是我的密钥,它对应于表B中的B2 B6 B9。我需要构建一个以A1 A2 A3作为密钥并返回我B8的查找函数。
This is what I tried last week: 这是我上周尝试过的:
//getting the data in to dataframe
val clsrowRDD = clsfile.map(_.split("\t")).map(p => Row(p(0),p(1),p(2),p(3),p(4),p(5),p(6),p(7),p(8)))
val clsDataFrame = sqlContext.createDataFrame(clsrowRDD, clsschema)
//mapping the three key with the value
val smallRdd = clsDataFrame.rdd.map{row: Row => (mutable.WrappedArray.make[String](Array(row.getString(1), row.getString(5), row.getString(8))), row.getString(7))}
val lookupMap:Map[mutable.WrappedArray[String], String] = smallRdd.collectAsMap()
//build the look up function
def lookup(lookupMap: Map[mutable.WrappedArray[String],String]) =
udf((input: mutable.WrappedArray[String]) => lookupMap.lift(input))
//call the function
val combinedDF = mstrDataFrame.withColumn("ENTP_CLS_CD",lookup(lookupMap)($"SRC_SYS_CD",$"ORG_ID",$"ORG_CD"))
And this code compiles, but doesn't really return me the results I need. 这段代码可以编译,但是并没有真正返回我所需的结果。 I am thinking it's because I am passing in an array as the key and I don't really have array inside my table.
我在想这是因为我传入了一个数组作为键,而我的表中却没有数组。 But when I tried change the map type as
Map[(String,String,String),String]
, I don't know how you pass it in the function. 但是,当我尝试将地图类型更改为
Map[(String,String,String),String]
,我不知道如何在函数中传递它。
Tons of thanks. 谢谢。
If you are trying to get B8
value for every matching of A1
with B2
and A2
with B6
and A3
with B9
, then simple join
and select
methods should do the trick. 如果您试图为
A1
与B2
以及A2
与B6
以及A3
与B9
每次匹配获取B8
值,那么简单的join
和select
方法就可以解决问题。 Creating a lookup map would create complexity. 创建查找图会带来复杂性。
As you explained you have to dataframes df1
and df2
as 正如您所解释的,您必须将数据帧
df1
和df2
为
+---+---+---+
|A1 |A2 |A3 |
+---+---+---+
|a |b |c |
+---+---+---+
+---+---+---+---+---+---+---+---+---+
|B1 |B2 |B3 |B4 |B5 |B6 |B7 |B8 |B9 |
+---+---+---+---+---+---+---+---+---+
|1 |a |3 |4 |5 |b |7 |8 |c |
|1 |a |3 |4 |5 |b |7 |8 |e |
+---+---+---+---+---+---+---+---+---+
Simple join
and select
can be done 简单的
join
和select
即可完成
df1.join(df2, $"A1" === $"B2" && $"A2" === $"B6" && $"A3" === $"B9", "inner").select("B8")
which should give you 这应该给你
+---+
|B8 |
+---+
|8 |
+---+
I hope the answer is helpful 我希望答案是有帮助的
Updated 更新
According to what I understood from your question and comments below, you are confused on how to pass array
to your lookup
udf
function. 根据我从下面的问题和评论中了解的内容,您对如何将
array
传递给lookup
udf
函数感到困惑。 For that you can use array function. 为此,您可以使用数组函数。 I have modified some parts of your almost perfect code to make it work
我已经修改了几乎完美的代码中的某些部分以使其正常工作
//mapping the three key with the value
val smallRdd = clsDataFrame.rdd
.map{row: Row => (mutable.WrappedArray.make[String](Array(row.getString(1), row.getString(5), row.getString(8))), row.getString(7))}
val lookupMap: collection.Map[mutable.WrappedArray[String], String] = smallRdd.collectAsMap()
//build the look up function
def lookup(lookupMap: collection.Map[mutable.WrappedArray[String],String]) =
udf((input: mutable.WrappedArray[String]) => lookupMap.lift(input))
//call the function
val combinedDF = mstrDataFrame.withColumn("ENTP_CLS_CD",lookup(lookupMap)(array($"SRC_SYS_CD",$"ORG_ID",$"ORG_CD")))
You should have 你应该有
+----------+------+------+-----------+
|SRC_SYS_CD|ORG_ID|ORG_CD|ENTP_CLS_CD|
+----------+------+------+-----------+
|a |b |c |8 |
+----------+------+------+-----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.