Need to update a PySpark dataframe if the column contains the certain substring
for example:
df looks like
id address
1 spring-field_garden
2 spring-field_lane
3 new_berry place
If the address column contains spring-field_
just replace it with spring-field
.
Expected result:
id address
1 spring-field
2 spring-field
3 new_berry place
Tried:
df = df.withColumn('address',F.regexp_replace(F.col('address'), 'spring-field_*', 'spring-field'))
Seems not working.
You can use like
with when
expression:
from pyspark.sql import functions as F
df = df.withColumn(
'address',
F.when(
F.col('address').like('%spring-field_%'),
F.lit('spring-field')
).otherwise(F.col('address'))
)
You can use the following regex:
df.withColumn(
'address',
F.regexp_replace('address', r'.*spring-field.*', 'spring-field')
)
Alternatively you can use the method contains
:
df.withColumn(
'address',
F.when(
F.col('address').contains("spring-field"), "spring-field"
).otherwise(F.col('address'))
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.