简体   繁体   中英

Update columns of spark dataframe of type Map with desired key and value

I have a following spark dataframe where all the columns (except for primary key column emp_id) consist of a map (with keys 'from' and 'to' which can have null values). I want to evaluate 'from' and 'to' of each column(except emp_id) and add a new key to the map(named 'change') which has a value of a) 'insert' if 'from' value is null and 'to' is not null b) 'delete' if 'to' value is null and 'from' is not null b) 'update' if 'from' and 'to' are not null & 'from' value is different from 'to' value

Note: columns which have null value will remain untouched.

How can we achieve this in Scala.

|emp_id|emp_city             |emp_name                    |emp_phone            |emp_sal                    |emp_site                          |

|1     |null                 |[from -> Will, to -> Watson]|null                 |[from -> 1000, to -> 8000]|[from ->, to -> Seattle]          |
|3     |null                 |[from -> Norman, to -> Nate]|null                 |[from -> 1000, to -> 8000]|[from -> CherryHill, to -> Newark]|
|4     |[from ->, to -> Iowa]|[from ->, to -> Ian]        |[from ->, to -> 1004]|[from ->, to -> 8000]     |[from ->, to -> Des Moines]       |

Expected:

|emp_id|emp_city             |emp_name                    |emp_phone            |emp_sal                    |emp_site                          |

|1     |null                 |[from -> Will, to -> Watson, change -> update]|null                 |[from -> 1000, to -> 8000, change -> update]|[from ->, to -> Seattle, change -> insert]          |
|3     |null                 |[from -> Norman, to -> Nate, change -> update]|null                 |[from -> 1000, to -> 8000, change -> update]|[from -> CherryHill, to -> Newark, change -> update]|
|4     |[from ->, to -> Iowa, change -> insert]|[from ->, to -> Ian, change -> insert]        |[from ->, to -> 1004, change -> insert]|[from ->, to -> 8000, change -> insert]     |[from ->, to -> Des Moines, change -> insert]       |

One way to achieve this is by using UDF , which is not a great solution but I can't think of other solutions.

Try not to use UDF as possible

val updateMap = udf((input: Map[String, String]) => {
  if (input == null || input.isEmpty)
    Map.empty[String, String]
  else if (input("from") == null && input("to") != null)
    input + ("change" -> "insert")
  else if (input("from") != null && input("to") == null)
    input + ("change" -> "delete")
  else if (!(input("from").equals(input("to"))))
    input + ("change" -> "update")
  else
    Map.empty[String, String]

})

val result = df.columns.tail.foldLeft(df) { (acc, name) =>
  acc.withColumn(name, updateMap(col(name)))
}

Make sure your columns are Map[String, String]

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM