简体   繁体   中英

How to update dataframe column in Spark Scala after join?

Join of two dataframes results into almost 60 columns. Most of them suppose to stay as is, but some require update based on values in other columns. Is there a way to update those columns w/o calculating new, removing the originals and renaming the calculated back?

Simplified example: the revenue in $"Sales column from the left dataframe is supposed to be weighted by the $"Weight in the join results. Is there an efficient way to make the calculation w/o generating the $"SalesWeighted as a new column, dropping the original $Sales and re-naming $SalesWeighted into $Sales ?

val l = Seq((1, 50), (2, 35), (3, 66))
            .toDF("Id", "Sales")

val r = Seq((1, "Premium", 0.2), (1, "Standard", 0.8), 
            (2, "Premium", 0.4), (2, "Standard", 0.6), 
            (3, "Premium", 0.333), (3, "Standard", 0.333), (3, "Garbage", 0.334))
            .toDF("Id", "Grade", "Weight")

display(l.join(r, Seq("Id")).withColumn("SalesWeighted", $"Sales"*$"Weight")
            .orderBy($"Id", $"Grade"))

Use Drop to remove the unnecessary columns

val l = Seq((1, 50), (2, 35), (3, 66))
                .toDF("Id", "Sales")

    val r = Seq((1, "Premium", 0.2), (1, "Standard", 0.8), 
                (2, "Premium", 0.4), (2, "Standard", 0.6), 
                (3, "Premium", 0.333), (3, "Standard", 0.333), (3, "Garbage", 0.334))
                .toDF("Id", "Grade", "Weight")

    display(l.join(r, Seq("Id")).withColumn("SalesWeighted", $"Sales"*$"Weight").drop($"Sales")
                .orderBy($"Id", $"Grade"))

You can simply name the new column the same as the column to be replaced:

l.join(r, Seq("Id")).withColumn("Sales", $"Sales" * $"Weight").
  orderBy($"Id", $"Grade")

Or, just use select :

l.join(r, Seq("Id")).
  select($"Id", $"Grade", $"Weight", ($"Sales" * $"Weight").as("Sales")).
  orderBy($"Id", $"Grade")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM