I have this sql query which is a left-join and has a select statement in the beginning which chooses from the right table columns as well.. Can you please help to convert it to a spark dataframes and get the result using spark-shell? I don't want to use the sql code in spark instead I want to use dataframes.
I know the join syntax in scala, but I don't know how to choose from the right table (here it is count(w.id2)) when left join doesn't have access to the right table's columns.
Thank you!
select count(x.user_id) user_id_count, count(w.id2) current_id2_count
from
(select
user_id
from
tb1
where
year='2021'
and month=1
) x
left join
(select id1, max(id2) id2 from tb2 group by id1) w
on
x.user_id=w.id1;
In spark I would create two dataframes x and w and join them:
var x = spark.sqlContext.table("tb1").where("year='2021' and month=1")
var w= spark.sqlContext.table("tb2").groupBy("id1").agg((max("id2").alias("id2"))
var joined = x.join(w, x("user_id")===w("id1"), "left")
Your request is quite difficult to understand, however I am gonna try to reply taking the SQL code you provided as baseline and reproduce it with Spark.
// Reading tb1 (x) and filtering for Jan 2021, selecting only "user_id"
val x: DataFrame = spark.read
.table("tb1")
.filter(col("year") === "2021")
.filter(col("mont") === "01")
.select("user_id")
// Reading tb2 (w) and for each "id1" getting the max "id2"
val w: DataFrame = spark.read
.table("tb2")
.groupBy(col("id1"))
.max("id2")
// Joining tb1 (x) and tb2 (w) on "user_id" === "id1", then counting user_id and id2
val xJoinsW: DataFrame = x
.join(w, x("user_id") === w("id1"), "left")
.select(count(col("user_id").as("user_id_count")), count(col("id2").as("current_id2_count")))
A small but relevant remark, as you're using Scala and Spark, I would suggest you to use val
and not var
. val
means it's final, cannot be reassigned, whereas, var
can be reassigned later. You can read more here .
Lastly, feel free to change the Spark reading mechanism with whatever you like.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.