简体   繁体   English

如何比较spark scala中的两个数据帧?

[英]how to compare two data frames in spark scala?

I am have two data frames with max timestamp value in each.我有两个数据帧,每个数据帧都有最大时间戳值。

val Table1max=spark.read.format("parquet").option("header","true").load(s"${SourcePath}/ab12")
Table1max.createOrReplaceTempView("temp") 

val table2max=spark.read.format("parquet").option("header","true").load(s"${SourcePath}/abc")
table2max.createOrReplaceTempView("temp1")

Then select max update date from both

val table1maxvalue = spark.sql(s"select max(UPDATE_DATE) from temp")
val table2maxvalue= spark.sql(s"select max(UPDATE_DATE) from temp1")

Here table1maxvalue and table2maxvalue are dataframes.

table1maxvalue
+--------------------+
|    max(UPDATE_DATE)|
+--------------------+
|2022-05-02 01:04:...|
+--------------------+

table2maxvalue

+--------------------+
|    max(UPDATE_DATE)|
+--------------------+
|2022-05-02 01:04:...|
+--------------------+

Now how can I check if table1maxvalue > table2maxvalue it should something.现在我如何检查 table1maxvalue > table2maxvalue 它应该是什么。 Like喜欢

if(table1maxvalue<table2maxvalue){
Do something
}

As it is data frame i am getting this error: value >= is not a member of org.apache.spark.sql.DataFrame因为它是数据框,所以我收到此错误:值 >= 不是 org.apache.spark.sql.DataFrame 的成员

Pls suggest.请建议。

You are trying to compare a dataFrame to another data Frame.您正在尝试将 dataFrame 与另一个数据帧进行比较。 You actually need to reference the first row, and then retrieve the value from that row.您实际上需要引用第一行,然后从该行中检索值。

In this case you can use the following:在这种情况下,您可以使用以下内容:

table1maxvalue //Data frame
.head()        //get the first row
.getDate(0)    //get the first column as a date.

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM