繁体   English   中英

条件连接,触发 scala(在限制之间)

[英]Conditional join, spark scala (between limits)

我有两个数据框。

df1:

Team, Sport, CostTicket
Stars, Fotball, 10
Circles, Fotball, 20
Stars, Basket, 12
Stars, Baseball, 14
Circles, Baseball, 25

df2:

Team, Sport, CostRange, LowerLimit, UpperLimit
Stars, Football, 0<3, 0, 3
Stars, Football, 4<10, 4, 10
Stars, Football, 11<22, 11, 22
Stars, Football, 24<25, 24, 25
Circles, Football, 0<4, 0, 4
Circles, Football, 5<10, 5, 10
Circles, Football, 11<20, 11, 20
Circles, Football, 21<30, 21, 30
Stars, Basket, 0<2, 0, 2
Stars, Basket, 3<7, 3, 7
Stars, Basket, 8<19, 8, 19
Stars, Basket, 20<30, 20, 30
Circles, Basket, 0<1, 0, 1
Circles, Basket, 2<4, 2, 4
Circles, Basket, 5<15, 5, 15
Circles, Basket, 16<30, 16, 30
Stars, Baseball, 0<10, 0, 10
Stars, Baseball, 11<20, 11, 20
Stars, Baseball, 21<30, 21, 30
Circles, Baseball, 0<4, 0, 4
Circles, Baseball, 5<10, 5, 10
Circles, Baseball, 11<20, 11, 20
Circles, Baseball, 21<30, 21, 30

我想使用来自 df2 的 CostRange 向 df1 添加第四列。

最终结果应该是:

Team, Sport, CostTicket, Range
Stars, Fotball, 10, 5<10
Circles, Fotball, 20, 11<22
Stars, Basket, 12, 8<19
Stars, Baseball, 14, 11<20
Circles, Baseball, 25, 21<30

我已经走到了这一步,但它不起作用。 有没有人可以帮我解决这个问题?

val df1 = df2.withColumn("Range", df2("CostRange"))
  .where(df1("CostTicket") > df2("LowerLimit"))
  .where(df1("CostTicket") < df2("UpperLimit"))
  .where(df1("Team") === df2("Team"))
  .where(df1("Sport") === df2("Sport"))

您不能从两个不同的数据帧中选择两列,您需要先加入两个数据帧

您可以先加入两列,然后使用where如下

df1.join(df2, Seq("Team", "Sport"))
    .where($"CostTicket"  >= $"LowerLimit" && $"CostTicket" <= $"UpperLimit")

或者您可以在连接条件本身中指定如下

df1.join(df2,
    df1("Team") === df2("Team") &&
    df1("Sport") === df2("Sport") &&
    df1("CostTicket") >= df2("LowerLimit") &&
    df1("CostTicket") <= df2("UpperLimit")
).drop(df2("Team"))
 .drop(df2("Sport"))

输出:

+-------+--------+----------+---------+----------+----------+
|Team   |Sport   |CostTicket|CostRange|LowerLimit|UpperLimit|
+-------+--------+----------+---------+----------+----------+
|Stars  |Football|10        |4<10     |4         |10        |
|Circles|Football|20        |11<20    |11        |20        |
|Stars  |Basket  |12        |8<19     |8         |19        |
|Stars  |Baseball|14        |11<20    |11        |20        |
|Circles|Baseball|25        |21<30    |21        |30        |
+-------+--------+----------+---------+----------+----------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM