[英]Split single String column to multiple columns in Spark-Scala
我有一個數據框:
+----+--------------------------+
|city|Types |
+----+--------------------------+
|BNG |school |
|HYD |school,restaurant |
|MUM |school,restaurant,hospital|
+----+--------------------------+
我想用','在多個列中拆分Types列。
問題是列大小不固定,所以我不知道該怎么做。
我在 pyspark 中看到了另一個相關問題,但我想在 spark-scala 而不是 pyspark 中進行
任何幫助表示贊賞。
提前致謝
解決列中不規則大小的一種方法是調整表示。
例如:
val data = Seq(("BNG", "school"),("HYD", "school,res"),("MUM", "school,res,hos")).toDF("city","types")
+----+--------------+
|city| types|
+----+--------------+
| BNG| school|
| HYD| school,res|
| MUM|school,res,hos|
+----+--------------+
data.withColumn("isSchool", array_contains(split(col("types"),","), "school")).withColumn("isRes", array_contains(split(col("types"),","), "res")).withColumn("isHos", array_contains(split(col("types"),","), "hos"))
+----+--------------+--------+-----+-----+
|city| types|isSchool|isRes|isHos|
+----+--------------+--------+-----+-----+
| BNG| school| true|false|false|
| HYD| school,res| true| true|false|
| MUM|school,res,hos| true| true| true|
+----+--------------+--------+-----+-----+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.