简体   繁体   中英

Join two dataframes with different records and size in Spark

It seems this issue asked couple of times, but the solutions that suggested in previous questions not working for me.

I have two dataframe s with different dimensions as shown in picture below. The table two second was part of table one first but after some processing on it I added one more column column4 . Now I want to join these two tables such that I have table three Required after joining.

Things that tried.

So I did couple of different solution but no one works for me.

I tried

val required =first.join(second, first("PDE_HDR_CMS_RCD_NUM") === second("PDE_HDR_CMS_RCD_NUM") , "left_outer")

Also I tried

val required = first.withColumn("SEQ", when(second.col("PDE_HDR_FILE_ID") === (first.col("PDE_HDR_FILE_ID").alias("PDE_HDR_FILE_ID1")), second.col("uniqueID")).otherwise(lit(0)))

In the second attempt I used .alias after I get an error that says

Error occured during extract process. Error: org.apache.spark.sql.AnalysisException: Resolved attribute(s) uniqueID#775L missing from.

Thanks for taking time to read my question

在此处输入图片说明

To generate the wanted result, you should join the two tables on column(s) that are row-identifying in your first table. Assuming c1 + c2 + c3 uniquely identifies each row in the first table, here's an example using a partial set of your sample data:

import org.apache.spark.sql.functions._
import spark.implicits._

val df1 = Seq(
  (1, "e", "o"),
  (4, "d", "t"),
  (3, "f", "e"),
  (2, "r", "r"),
  (6, "y", "f"),
  (5, "t", "g"),
  (1, "g", "h"),
  (4, "f", "j"),
  (6, "d", "k"),
  (7, "s", "o")
).toDF("c1", "c2", "c3")

val df2 = Seq(
  (3, "f", "e", 444),
  (5, "t", "g", 555),
  (7, "s", "o", 666)
).toDF("c1", "c2", "c3", "c4")

df1.join(df2, Seq("c1", "c2", "c3"), "left_outer").show
// +---+---+---+----+
// | c1| c2| c3|  c4|
// +---+---+---+----+
// |  1|  e|  o|null|
// |  4|  d|  t|null|
// |  3|  f|  e| 444|
// |  2|  r|  r|null|
// |  6|  y|  f|null|
// |  5|  t|  g| 555|
// |  1|  g|  h|null|
// |  4|  f|  j|null|
// |  6|  d|  k|null|
// |  7|  s|  o| 666|
// +---+---+---+----+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM