简体   繁体   English

Scala Spark,比较两个DataFrames和select另一列的值

[英]Scala Spark, compare two DataFrames and select the value of another column

I have two dataframes.我有两个数据框。 What I want to do exactly is:我真正想做的是:

If the column Name is "P" then I have to select the column called FinalValue of the DF2 where the column id_1 match the column Id_name of the DF2, otherwise I have to fill it with nulls.如果列名称是“P”,那么我必须 select 列名为 DF2 的FinalValue列,其中列id_1与 DF2 的列Id_name匹配,否则我必须用空值填充它。

For example, I have the following DataFrames (DF1 and DF2):例如,我有以下数据帧(DF1 和 DF2):

+--------+-------+-------+
|Name    | value | id_1  |
+- ------+-------+-------+
|P       |5      | being |      
|X       |1      | dose  |
|Z       |1      | yex   |

df2
+--------+------------+
|Id_name | FinalValue |
+- ------+------------+
|ash     | h32        |
|being   | c11        |
|dose    | g21        |

In this case the output should be:在这种情况下,output 应该是:

+--------+-------+-------------+
|Name    | value | FinalValue  | 
+- ------+-------+-------------+
|P       |5      | c11         |      
|X       |1      | null        |
|Z       |1      | null        |

What I am trying is the following:我正在尝试的是以下内容:

var df3 = df1.withColumn("FinalValue", when($"Name" === "P", df2.select(...)))

But as you can see, I don't know how to continue because if I select a column of the DF2 I can't select another of the DF1.但是正如你所看到的,我不知道如何继续,因为如果我 select DF2 的一列我不能 select 另一个 DF1。 How can I do this?我怎样才能做到这一点?

Maybe my explanation is not good enough, if you need more information or explanation, just tell me it.也许我的解释不够好,如果您需要更多信息或解释,请告诉我。 Thanks in advance.提前致谢。

You can do a left join, then mask the final value using when :您可以进行左连接,然后使用when屏蔽最终值:

val df3 = df1.join(
    df2,
    df1("id_1") === df2("Id_name"),
    "left"
).select(
    df1.columns.dropRight(1).map(col) :+
    when($"Name" === "P", $"FinalValue").as("FinalValue")
    : _*
)

df3.show
+----+-----+----------+
|Name|value|FinalValue|
+----+-----+----------+
|   P|    5|       c11|
|   X|    1|      null|
|   Z|    1|      null|
+----+-----+----------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM