Spark: How to merge two similar columns from two DataFrames in one column by doing join?

Question

I have SQL table that I have to update by using data from with table.

For this purpose, I calculate DataFrame.

I have two DataFrame: that I calculate and that I get from database.

val myDF = spark.read.<todo something>.load()

val dbDF = spark.read.format("jdbc").<...>.load()

Finally, both DataFrame have the same structure.

For example:

myDF

key	column
key1	1
key2	2
key3	3

dbDF

key	column
key1	5
key2	5
key3	5

I need to get new DF that will have only one column with name Column.

newDF

key	column
key1	6
key2	7
key3	8

For this purpose, I do next actions:

myDF
  .as("left")
  .join(dbDF.as("right"), "key")
  .withColumn("column_temp", $"left.column" + $"right.column")
  .drop($"left.column")
  .drop(s"right.column")
  .withColumnRenamed("column_temp", "column")

I have to do these actions for each column that I have to calculate.

In other words, my joins don't assume adding new columns. I have to merge similar columns into one column.

I can calculate new column by sum two column, or a can just choose not null column from two given columns, like that:

myDF
  .as("left")
  .join(dbDF.as("right"), "key")
  .withColumn("column_temp", coalesce($"left.column", $"right.column"))
  .drop($"left.column")
  .drop(s"right.column")
  .withColumnRenamed("column_temp", "column")

And when my DataFrame have many columns and only 1 or 2 key columns, I have to repeat above actions for each column.

My question is:

Is there more effective way to do what I do? Or do I do it right?

Answer 1

    myDF.join(dbDF,myDF.col("key").equalTo(dbDF.col("key")))
            .select(myDF.col("key"))
            .withColumn("column",myDF.col("key").plus(dbDF.col("key")));

Can you try this? It is an inner join so only those rows in the left table that have a match in the right are selected. Is that your case?

Spark: How to merge two similar columns from two DataFrames in one column by doing join?

Question

1 answers

solution1
0 2021-11-18 19:20:19

Spark: How to merge two similar columns from two DataFrames in one column by doing join?

Question

1 answers

solution1 0 2021-11-18 19:20:19

solution1
0 2021-11-18 19:20:19