[英]Create an empty DF using schema from another DF (Scala Spark)
[英]Create a new DF using another two
我有两个共享列颜色的数据框,我想使用与新 DF 中的列颜色对应的代码创建一个新列,如您所见:
DF1
+------------+--------------------+
| Code | colour |
+------------+--------------------+
| 1001 | brown |
| 1201 | black |
| 1300 | green |
+------------+--------------------+
DF2
+------------+--------------------+-----------+
| Name | colour | date |
+------------+--------------------+-----------+
| Joee | brown | 20210101 |
| Jess | black | 20210101 |
| James | green | 20210101 |
+------------+--------------------+-----------+
Output:
+------------+--------------------+-----------+----------+
| Name | colour | date | Got |
+------------+--------------------+-----------+----------+
| Joee | black | 20210101 | 1201 |
| Jess | brown | 20210101 | 1001 |
| James | blue | 20210101 | 092 |
+------------+--------------------+-----------+----------+
我怎样才能做到这一点? 加入?
正如mck 所建议的那样,一个简单的 SQL join
对于您的情况就足够了,通过显式指定两个 DataFrame 之间colour
列的值的相等性,如下所示(我们删除两个colour
列之一,因为它们每个都有相同的值加入后的行):
val joined = df1.join(df2, df1("colour").equalTo(df2("colour")))
.drop(df1("colour"))
这是我们在show
新形成的joined
DataFrame 后得到的:
+----+-----+------+--------+
|code| name|colour| date|
+----+-----+------+--------+
|1001| Joe| brown|20210101|
|1201| Jess| black|20210101|
|1300|James| green|20210101|
+----+-----+------+--------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.