简体   繁体   English

如何仅使用数据集API联接两个数据集

[英]How to join two Datasets using only Dataset API

I am struggling to flatten the dataset resulting from joining two other datasets. 我正在努力拼合通过合并其他两个数据集而得到的数据集。 Below is my code : 下面是我的代码:

  val family = Seq(
      Person(0, "Agata", 0),
      Person(1, "Iweta", 0),
      Person(2, "Patryk", 2),
      Person(3, "Maksym", 0)).toDS
    val cities = Seq(
      City(0, "Warsaw"),
      City(1, "Washington"),
      City(2, "Sopot")).toDS

then the join : 然后加入:

val joined = family.joinWith(cities, family("cityId") ===cities("id"),"crossjoin")

The obtained result is : 得到的结果是:

joined: org.apache.spark.sql.Dataset[(Person, City)]
 _1| _2|

[0,Agata,0]|[0,Warsaw]| | 
[1,Iweta,0]|[0,Warsaw]| |
[2,Patryk,2]| [2,Sopot]| |
[3,Maksym,0]|[0,Warsaw] |

I want to flatten this and to get the following dataset : 我想展平并获得以下数据集:

val output: Dataset= 
[0,Agata,0,Warsaw]|
[1,Iweta,0,Warsaw]|
[2,Patryk,2,Sopot]| 
[3,Maksym,0,Warsaw]

Any idea how to do this without using dataframe API , I want it to be totally done with Dataset API. 任何不使用dataframe API怎么做的想法,我希望它完全通过Dataset API完成。 Thanks a lot for your help. 非常感谢你的帮助。 Best Regards 最好的祝福

Using join itself, You will get the same output. 使用join本身,您将获得相同的输出。

family.join(cities, family("cityId")===cities("id")).drop("id")

Sample Output: 样本输出:

+--------+------+--------+
|cityName|cityId|cityName|
+--------+------+--------+
|   Agata|     0|  Warsaw|
|   Iweta|     0|  Warsaw|
|  Patryk|     2|   Sopot|
|  Maksym|     0|  Warsaw|
+--------+------+--------+
...
val joined = family.join(cities).where(family("cityid") === cities("id")).drop("id") // adding this means use the DF 
joined.show

The experimental aspect means not yet usage of joinWith should be formally considered. 实验方面意味着尚未正式考虑使用joinWith。

DF and DS are growing towards eachother, so it should not matter. DF和DS正在彼此接近,所以没关系。 What you want is not actually possible I believe. 我相信,实际上您不可能想要什么。

Also, they say Dataset[Row] aka DataFrame. 另外,他们说Dataset [Row]又名DataFrame。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM