[英]Scala Spark, compare two DataFrames and select the value of another column
I have two dataframes.我有两个数据框。 What I want to do exactly is:我真正想做的是:
If the column Name is "P" then I have to select the column called FinalValue of the DF2 where the column id_1 match the column Id_name of the DF2, otherwise I have to fill it with nulls.如果列名称是“P”,那么我必须 select 列名为 DF2 的FinalValue列,其中列id_1与 DF2 的列Id_name匹配,否则我必须用空值填充它。
For example, I have the following DataFrames (DF1 and DF2):例如,我有以下数据帧(DF1 和 DF2):
+--------+-------+-------+
|Name | value | id_1 |
+- ------+-------+-------+
|P |5 | being |
|X |1 | dose |
|Z |1 | yex |
df2
+--------+------------+
|Id_name | FinalValue |
+- ------+------------+
|ash | h32 |
|being | c11 |
|dose | g21 |
In this case the output should be:在这种情况下,output 应该是:
+--------+-------+-------------+
|Name | value | FinalValue |
+- ------+-------+-------------+
|P |5 | c11 |
|X |1 | null |
|Z |1 | null |
What I am trying is the following:我正在尝试的是以下内容:
var df3 = df1.withColumn("FinalValue", when($"Name" === "P", df2.select(...)))
But as you can see, I don't know how to continue because if I select a column of the DF2 I can't select another of the DF1.但是正如你所看到的,我不知道如何继续,因为如果我 select DF2 的一列我不能 select 另一个 DF1。 How can I do this?我怎样才能做到这一点?
Maybe my explanation is not good enough, if you need more information or explanation, just tell me it.也许我的解释不够好,如果您需要更多信息或解释,请告诉我。 Thanks in advance.提前致谢。
You can do a left join, then mask the final value using when
:您可以进行左连接,然后使用when
屏蔽最终值:
val df3 = df1.join(
df2,
df1("id_1") === df2("Id_name"),
"left"
).select(
df1.columns.dropRight(1).map(col) :+
when($"Name" === "P", $"FinalValue").as("FinalValue")
: _*
)
df3.show
+----+-----+----------+
|Name|value|FinalValue|
+----+-----+----------+
| P| 5| c11|
| X| 1| null|
| Z| 1| null|
+----+-----+----------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.