简体   繁体   中英

Split single String column to multiple columns in Spark-Scala

I have a dataframe as:

+----+--------------------------+
|city|Types                     |
+----+--------------------------+
|BNG |school                    |
|HYD |school,restaurant         |
|MUM |school,restaurant,hospital|
+----+--------------------------+

I wanna split Types column in multiple cols with ',' .

The problem is column size is not fixed so I not getting how to do it.

I saw another related question in pyspark but I wanna do it in spark-scala and not pyspark

Any help is appreciated.

Thanks in advance

one way to address the irregular size in the column is to tweak the representation.

for example:

val data = Seq(("BNG", "school"),("HYD", "school,res"),("MUM", "school,res,hos")).toDF("city","types")

+----+--------------+
|city|         types|
+----+--------------+
| BNG|        school|
| HYD|    school,res|
| MUM|school,res,hos|
+----+--------------+

data.withColumn("isSchool", array_contains(split(col("types"),","), "school")).withColumn("isRes", array_contains(split(col("types"),","), "res")).withColumn("isHos", array_contains(split(col("types"),","), "hos"))

+----+--------------+--------+-----+-----+
|city|         types|isSchool|isRes|isHos|
+----+--------------+--------+-----+-----+
| BNG|        school|    true|false|false|
| HYD|    school,res|    true| true|false|
| MUM|school,res,hos|    true| true| true|
+----+--------------+--------+-----+-----+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM