简体   繁体   English

根据 id 比较来自两个不同数据帧的列

[英]Compare columns from two different dataframes based on id

I have two dataframes to compare, the order of records are different, the name of columns might be different.我有两个要比较的数据框,记录的顺序不同,列的名称可能不同。 Have to compare columns (more than one) based on the unique key (id)必须根据唯一键(id)比较列(多个)

Example: consider cataframes df1 and df2示例:考虑 cataframes df1 和 df2

df1: df1:

+---+-------+-----+
| id|student|marks|
+---+-------+-----+
|  1|  Vijay|   23|
|  4| Vithal|   24|
|  2|    Ram|   21|
|  3|  Rahul|   25|
+---+-------+-----+

df2: df2:

+-----+--------+------+
|newId|student1|marks1|
+-----+--------+------+
|    3|   Rahul|    25|
|    2|     Ram|    23|
|    1|   Vijay|    23|
|    4|  Vithal|    24|
+-----+--------+------+

Here based on id and newId , I need to compare values studentName and Marks, and need to check that whether the student with same id has same name and marks这里根据idnewId ,我需要比较值 studentName 和 Marks,并且需要检查具有相同 id 的学生是否具有相同的名称和标记

In this example student with id 2 has 21 marks but in df2 23 marks在此示例中,id 为2的学生有21分,但在 df2 中为23

df1.exceptAll(df2).show()
// +---+-------+-----+                                                             
// | id|student|marks|
// +---+-------+-----+
// |  2|    Ram|   21|
// +---+-------+-----+

I think diff will give the result you are looking for.我认为diff会给出你正在寻找的结果。

scala> df1.diff(df2)
res0: Seq[org.apache.spark.sql.Row] = List([2,Ram,21])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM