简体   繁体   中英

How to change the column values of a DataFrame into title case in scala .

Input Dataframe

val ds = Seq((1,"play framework"),
  (2,"spark framework"),
  (3,"spring framework ")).toDF("id","subject")

I am expecting title case on column subject like as follows .

 val ds = Seq((1,"Play Framework"),
  (2,"Spark Framework"),
  (3,"Spring Framework ")).toDF("id","subject")

I could use Use lower function from org.apache.spark.sql.functions

like ds.select($"subject", lower($"subject")).show

to convert into lower case . But how i can make a result as i expected as above ?

there is a inbuilt function called initcap which does exactly as you require

import org.apache.spark.sql.functions._
ds.withColumn("subject", initcap(col("subject"))).show(false)

the official documentation says it

public static Column initcap(Column e) Returns a new string column by converting the first letter of each word to uppercase. Words are delimited by whitespace.

You can do like this

val captalizeUDF=udf((str:String)=>str.split(" ").map(word=>word.trim.capitalize).mkString(" "))

ds.select($"id",captalizeUDF($"subject").alias("subject")).show

                     or

ds.select($"id",initcap($"subject").alias("subject")).show

Sample output:

+---+----------------+
| id|         subject|
+---+----------------+
|  1|  Play Framework|
|  2| Spark Framework|
|  3|Spring Framework|
+---+----------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM