比较spark中两个数据框中的列

Question

I have two dataframes, both of them contain different number of columns.我有两个数据框，它们都包含不同数量的列。 I need to compare three fields between them to check if those are equal.我需要比较它们之间的三个字段以检查它们是否相等。

I tried following approach but its not working.我尝试了以下方法，但它不起作用。

if(df_table_stats("rec_cnt").equals(df_aud("REC_CNT")) || df_table_stats("hashcount").equals(df_aud("HASH_CNT")) || round(df_table_stats("hashsum"),0).equals(round(df_aud("HASH_TTL"),0)))
    {
        println("Job executed succefully")
    }

df_table_stats("rec_cnt") , this returns Column rather than actual value hence condition becoming false. df_table_stats("rec_cnt") ，这将返回列而不是实际值，因此条件变为假。

Also, please explain difference between df_table_stats.select("rec_cnt") and df_table_stats("rec_cnt") .另外，请解释df_table_stats.select("rec_cnt")和df_table_stats("rec_cnt")之间的df_table_stats.select("rec_cnt") 。

Thanks.谢谢。

Answer 1

根据您的条件使用 sql 和 inner join df 。

Answer 2

Per my comment, the syntax you're using are simple column references, they don't actually return data.根据我的评论，您使用的语法是简单的列引用，它们实际上并不返回数据。 Assuming you MUST use Spark for this, you'd want a method that actually returns the data, known in Spark as an action .假设您必须为此使用 Spark，您需要一个实际返回数据的方法，在 Spark 中称为action 。 For this case you can use take to return the first Row of data and extract the desired columns:对于这种情况，您可以使用take返回第一Row数据并提取所需的列：

val tableStatsRow: Row = df_table_stats.take(1).head
val audRow: Row = df_aud.take(1).head

val tableStatsRecCount = tableStatsRow.getAs[Int]("rec_cnt")
val audRecCount = audRow.getAs[Int]("REC_CNT")

//repeat for the other values you need to capture

However, Spark definitely is overkill if this is all you're using it for.但是，如果这就是您使用它的全部目的，Spark绝对是矫枉过正。 You could use a simple JDBC library for Scala like ScalikeJDBC to do these queries and capture the primitives in the results.您可以使用一个简单的 Scala JDBC 库（如ScalikeJDBC）来执行这些查询并捕获结果中的原语。

比较spark中两个数据框中的列

问题描述

2 个解决方案

解决方案1
0 2017-10-06 01:21:28

解决方案2
-1 已采纳 2017-10-06 00:44:01

比较spark中两个数据框中的列

问题描述

2 个解决方案

解决方案1 0 2017-10-06 01:21:28

解决方案2 -1 已采纳 2017-10-06 00:44:01

解决方案1
0 2017-10-06 01:21:28

解决方案2
-1 已采纳 2017-10-06 00:44:01