[英]Spark SQL - replace nulls with default values
我有以下數據幀架構:
root
|-- firstname: string (nullable = true)
|-- lastname: string (nullable = true)
|-- cities: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- name: string (nullable = true)
| | |-- postcode: string (nullable = true
我的數據框看起來像這樣:
+---------+--------+-----------------------------------+
|firstname|lastname|cities |
+---------+--------+-----------------------------------+
|John |Doe |[[New York,A000000], [Warsaw,null]]|
|John |Smith |[[Berlin,null]] |
|John |null |[[Paris,null]] |
+---------+--------+-----------------------------------+
我想用字符串“unknown”替換所有空值。 當我使用na.fill函數時,我得到以下數據幀:
df.na.fill("unknown").show()
+---------+--------+-----------------------------------+
|firstname|lastname|cities |
+---------+--------+-----------------------------------+
|John |Doe |[[New York,A000000], [Warsaw,null]]|
|John |Smith |[[Berlin,null]] |
|John |unknown |[[Paris,null]] |
+---------+--------+-----------------------------------+
如何替換dataframe中的所有空值(包括嵌套數組)?
na.fill
不會在數組列的struct字段中填充null元素。 一種方法是使用UDF,如下所示:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row
case class City(name: String, postcode: String)
val df = Seq(
("John", "Doe", Seq(City("New York", "A000000"), City("Warsaw", null))),
("John", "Smith", Seq(City("Berlin", null))),
("John", null, Seq(City("Paris", null)))
).toDF("firstname", "lastname", "cities")
val defaultStr = "unknown"
def patchNull(default: String) = udf( (s: Seq[Row]) =>
s.map( r => (r.getAs[String]("name"), r.getAs[String]("postcode")) match {
case (null, null) => (default, default)
case (c, null) => (c, default)
case (null, p) => (default, p)
case e => e
}
) )
df.
withColumn( "cities", patchNull(defaultStr)($"cities") ).
na.fill(defaultStr).
show(false)
// +---------+--------+--------------------------------------+
// |firstname|lastname|cities |
// +---------+--------+--------------------------------------+
// |John |Doe |[[New York,A000000], [Warsaw,unknown]]|
// |John |Smith |[[Berlin,unknown]] |
// |John |unknown |[[Paris,unknown]] |
// +---------+--------+--------------------------------------+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.