简体   繁体   中英

Spark: How to merge two similar columns from two DataFrames in one column by doing join?

I have SQL table that I have to update by using data from with table.

For this purpose, I calculate DataFrame.

I have two DataFrame: that I calculate and that I get from database.

val myDF = spark.read.<todo something>.load()

val dbDF = spark.read.format("jdbc").<...>.load()

Finally, both DataFrame have the same structure.

For example:

myDF

key column
key1 1
key2 2
key3 3

dbDF

key column
key1 5
key2 5
key3 5

I need to get new DF that will have only one column with name Column.

newDF

key column
key1 6
key2 7
key3 8

For this purpose, I do next actions:

myDF
  .as("left")
  .join(dbDF.as("right"), "key")
  .withColumn("column_temp", $"left.column" + $"right.column")
  .drop($"left.column")
  .drop(s"right.column")
  .withColumnRenamed("column_temp", "column")

I have to do these actions for each column that I have to calculate.

In other words, my joins don't assume adding new columns. I have to merge similar columns into one column.

I can calculate new column by sum two column, or a can just choose not null column from two given columns, like that:

myDF
  .as("left")
  .join(dbDF.as("right"), "key")
  .withColumn("column_temp", coalesce($"left.column", $"right.column"))
  .drop($"left.column")
  .drop(s"right.column")
  .withColumnRenamed("column_temp", "column")

And when my DataFrame have many columns and only 1 or 2 key columns, I have to repeat above actions for each column.

My question is:

Is there more effective way to do what I do? Or do I do it right?

    myDF.join(dbDF,myDF.col("key").equalTo(dbDF.col("key")))
            .select(myDF.col("key"))
            .withColumn("column",myDF.col("key").plus(dbDF.col("key")));

Can you try this? It is an inner join so only those rows in the left table that have a match in the right are selected. Is that your case?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM