[英]Return combined Dataset after joinWith in Spark Scala
Given the below two Spark Datasets
, flights
and capitals
, what would be the most efficient way to return combined (ie "joined") result without converting first to a DataFrame
or writing out all the columns out by name in a .select()
method? 给定以下两个Spark Datasets
, flights
和capitals
,最有效的方法是返回组合 (即“联接”)结果而不先转换为DataFrame
或在.select()
方法中按名称写出所有列的最有效方法? I know, for example, that I can access either tuple with (eg .map(x => x._1
) or use the *
operator with: 我知道,例如,我可以使用(例如.map(x => x._1
))访问元组,或者将*
运算符与以下内容一起使用:
result.select("_1.*","_2.*")
But the latter may result in duplicate column names and I'm hoping for a cleaner solution. 但是后者可能导致重复的列名,我希望有一个更干净的解决方案。
Thank you for your help. 谢谢您的帮助。
case class Flights(tripNumber: Int, destination: String)
case class Capitals(state: String, capital: String)
val flights = Seq(
(55, "New York"),
(3, "Georgia"),
(12, "Oregon")
).toDF("tripNumber","destination").as[Flights]
val capitals = Seq(
("New York", "Albany"),
("Georgia", "Atlanta"),
("Oregon", "Salem")
).toDF("state","capital").as[Capitals]
val result = flights.joinWith(capitals,flights.col("destination")===capitals.col("state"))
There are 2 options, but you will have to use join
instead of joinWith
: 有2个选项,但是您必须使用join
而不是joinWith
:
val result = flights.join(capitals,flights("destination")===capitals("state")).drop(capitals("state"))
那是Dataset API最好的部分,就是删除其中一个join列,因此无需在select中重复投影列: val result = flights.join(capitals,flights("destination")===capitals("state")).drop(capitals("state"))
val result = flights.join(capitals.withColumnRenamed("state", "destination"), Seq("destination"))
将两个数据集中的连接列重命名为相同,并使用稍微不同的方式指定连接: val result = flights.join(capitals.withColumnRenamed("state", "destination"), Seq("destination"))
Output: 输出:
result.show
+-----------+----------+-------+
|destination|tripNumber|capital|
+-----------+----------+-------+
| New York| 55| Albany|
| Georgia| 3|Atlanta|
| Oregon| 12| Salem|
+-----------+----------+-------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.