简体   繁体   中英

When querying a dataframe using spark/scala locally, how to change output of values in a column?

Im using spark/scala locally to transform json files into a dataframe .

My current dataframe has a column with 'Male' and 'Female' values, shown below. I want to change where you see 'Male' in the dataframe to 'M' and likewise for 'Female' to 'F' using spark -sql .

So far I have:

val results = spark.sql("SELECT name, case WHEN gender = 'Male' then 'M' WHEN gender = 'Female' then 'F' else 'Unknown' END from ocupation_table)

but it's creating a separate column and I want it to rename the values in the existing 'gender' column.

Tab to view dataframe

You can use Spark's withColumn(...) method to achieve this. It will replace a named column if it already exists. Something like this should do the trick:

import org.apache.spark.sql.functions

val results = df.withColumn("gender", substring(df("gender"), 0, 1))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM