簡體   English   中英

在 Spark-Scala 中將單個 String 列拆分為多個列

[英]Split single String column to multiple columns in Spark-Scala

我有一個數據框:

+----+--------------------------+
|city|Types                     |
+----+--------------------------+
|BNG |school                    |
|HYD |school,restaurant         |
|MUM |school,restaurant,hospital|
+----+--------------------------+

我想用','在多個列中拆分Types列。

問題是列大小不固定,所以我不知道該怎么做。

我在 pyspark 中看到了另一個相關問題,但我想在 spark-scala 而不是 pyspark 中進行

任何幫助表示贊賞。

提前致謝

解決列中不規則大小的一種方法是調整表示。

例如:

val data = Seq(("BNG", "school"),("HYD", "school,res"),("MUM", "school,res,hos")).toDF("city","types")

+----+--------------+
|city|         types|
+----+--------------+
| BNG|        school|
| HYD|    school,res|
| MUM|school,res,hos|
+----+--------------+

data.withColumn("isSchool", array_contains(split(col("types"),","), "school")).withColumn("isRes", array_contains(split(col("types"),","), "res")).withColumn("isHos", array_contains(split(col("types"),","), "hos"))

+----+--------------+--------+-----+-----+
|city|         types|isSchool|isRes|isHos|
+----+--------------+--------+-----+-----+
| BNG|        school|    true|false|false|
| HYD|    school,res|    true| true|false|
| MUM|school,res,hos|    true| true| true|
+----+--------------+--------+-----+-----+

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM