简体   繁体   English

比较两个数据框将不匹配的值添加为Spark中的新列

[英]Compare Two dataframes add mis matched values as a new column in Spark

Difference between two records is: 两条记录之间的区别是:

df1.except(df2)

Its getting results like this 它得到这样的结果

在此处输入图片说明

How to compare two dataframes and what changes, and where & which column have changes, add this value as a column. 如何比较两个数据帧以及哪些更改以及在何处和哪一列发生更改,请将此值添加为列。 Expected output like this 这样的预期输出

在此处输入图片说明

Join the two dataframe on the primary key, later using a with column and UDF pass the both column values(old and new values), in UDF compare the data and return the value if not same. 在主键上连接两个数据框,然后使用with列,UDF传递两个列值(旧值和新值),在UDF中比较数据并返回值(如果不相同)。

val check = udf ( (old_val:String,new_val:String) => if (old_val == new_val) new_val else "")

df_check= df
   .withColumn("Check_Name",check(df.col("name"),df.col("new_name")))
   .withColumn("Check_Namelast",check(df.col("lastname"),df.col("new_lastname")))

Or Def function 或Def功能

            def fn(old_df:Dataframe,new_df:Dataframe) : Dataframe = 
            {
            val old_df_array = old_df.collect() //make df to array to loop thru
            val new_df_array = new_df.collect() //make df to array to loop thru
            var value_change : Array[String] = ""

            val count = old_df.count
            val row_count = old_df.coloumn
            val row_c = row.length
            val coloumn_name = old_df.coloumn

            for (i to count ) //loop thru all rows
            {
            var old = old_df_array.Map(x => x.split(","))
            var new = new_df_array.Map(x => x.split(","))
            for (j to row_c ) //loop thru all coloumn
            {
            if( old(j) !=  new(j) )
            {
            value_change  = value_change + coloumn_name(j) " has value changed" ///this will add all changes in one full row
            }
            //append to array 
            append j(0) //primary key
            append value_change //Remarks coloumn
            }
            }
            //convert array to df
            }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM