[英]Conditional join, spark scala (between limits)
我有两个数据框。
df1:
Team, Sport, CostTicket
Stars, Fotball, 10
Circles, Fotball, 20
Stars, Basket, 12
Stars, Baseball, 14
Circles, Baseball, 25
和
df2:
Team, Sport, CostRange, LowerLimit, UpperLimit
Stars, Football, 0<3, 0, 3
Stars, Football, 4<10, 4, 10
Stars, Football, 11<22, 11, 22
Stars, Football, 24<25, 24, 25
Circles, Football, 0<4, 0, 4
Circles, Football, 5<10, 5, 10
Circles, Football, 11<20, 11, 20
Circles, Football, 21<30, 21, 30
Stars, Basket, 0<2, 0, 2
Stars, Basket, 3<7, 3, 7
Stars, Basket, 8<19, 8, 19
Stars, Basket, 20<30, 20, 30
Circles, Basket, 0<1, 0, 1
Circles, Basket, 2<4, 2, 4
Circles, Basket, 5<15, 5, 15
Circles, Basket, 16<30, 16, 30
Stars, Baseball, 0<10, 0, 10
Stars, Baseball, 11<20, 11, 20
Stars, Baseball, 21<30, 21, 30
Circles, Baseball, 0<4, 0, 4
Circles, Baseball, 5<10, 5, 10
Circles, Baseball, 11<20, 11, 20
Circles, Baseball, 21<30, 21, 30
我想使用来自 df2 的 CostRange 向 df1 添加第四列。
最终结果应该是:
Team, Sport, CostTicket, Range
Stars, Fotball, 10, 5<10
Circles, Fotball, 20, 11<22
Stars, Basket, 12, 8<19
Stars, Baseball, 14, 11<20
Circles, Baseball, 25, 21<30
我已经走到了这一步,但它不起作用。 有没有人可以帮我解决这个问题?
val df1 = df2.withColumn("Range", df2("CostRange"))
.where(df1("CostTicket") > df2("LowerLimit"))
.where(df1("CostTicket") < df2("UpperLimit"))
.where(df1("Team") === df2("Team"))
.where(df1("Sport") === df2("Sport"))
您不能从两个不同的数据帧中选择两列,您需要先加入两个数据帧
您可以先加入两列,然后使用where
如下
df1.join(df2, Seq("Team", "Sport"))
.where($"CostTicket" >= $"LowerLimit" && $"CostTicket" <= $"UpperLimit")
或者您可以在连接条件本身中指定如下
df1.join(df2,
df1("Team") === df2("Team") &&
df1("Sport") === df2("Sport") &&
df1("CostTicket") >= df2("LowerLimit") &&
df1("CostTicket") <= df2("UpperLimit")
).drop(df2("Team"))
.drop(df2("Sport"))
输出:
+-------+--------+----------+---------+----------+----------+
|Team |Sport |CostTicket|CostRange|LowerLimit|UpperLimit|
+-------+--------+----------+---------+----------+----------+
|Stars |Football|10 |4<10 |4 |10 |
|Circles|Football|20 |11<20 |11 |20 |
|Stars |Basket |12 |8<19 |8 |19 |
|Stars |Baseball|14 |11<20 |11 |20 |
|Circles|Baseball|25 |21<30 |21 |30 |
+-------+--------+----------+---------+----------+----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.