简体   繁体   中英

Replace all values of a column in a dataframe with pyspark

I am looking to replace all the values of a column in a spark dataframe with a particular value. I am using pyspark. I tried something like -

new_df = df.withColumn('column_name',10)

Here I want to replace all the values in the column column_name to 10 . In pandas this could be done by df['column_name']=10 . I am unable to figure out how to do the same in Spark.

You can use a UDF to replace the value. However you can use currying to bring support to different values.

from pyspark.sql.functions import udf, col

def replacerUDF(value):
    return udf(lambda x: value)

new_df = df.withColumnRenamed("newCol", replacerUDF(10)(col("column_name")))

It might be easier to use lit as follows:

from pyspark.sql.functions import lit
new_df = df.withColumn('column_name', lit(10))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM