[英]When querying a dataframe using spark/scala locally, how to change output of values in a column?
Im using spark/scala locally to transform json files into a dataframe .我在本地使用spark/scala将json文件转换为数据帧。
My current dataframe has a column with 'Male' and 'Female' values, shown below.我当前的数据框有一列包含“男性”和“女性”值,如下所示。 I want to change where you see 'Male' in the dataframe to 'M' and likewise for 'Female' to 'F' using spark -sql .我想使用 spark -sql将您在数据框中看到“男性”的位置更改为“M” ,同样将“女性”更改为“F” 。
So far I have:到目前为止,我有:
val results = spark.sql("SELECT name, case WHEN gender = 'Male' then 'M' WHEN gender = 'Female' then 'F' else 'Unknown' END from ocupation_table)
but it's creating a separate column and I want it to rename the values in the existing 'gender' column.但它正在创建一个单独的列,我希望它重命名现有“性别”列中的值。
You can use Spark's withColumn(...)
method to achieve this.您可以使用 Spark 的withColumn(...)
方法来实现这一点。 It will replace a named column if it already exists.如果已存在,它将替换命名列。 Something like this should do the trick:像这样的事情应该可以解决问题:
import org.apache.spark.sql.functions
val results = df.withColumn("gender", substring(df("gender"), 0, 1))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.