[英]Replace string if it contains certain substring in PySpark
Need to update a PySpark dataframe if the column contains the certain substring如果该列包含某些 substring,则需要更新 PySpark dataframe
for example:例如:
df looks like df 看起来像
id address
1 spring-field_garden
2 spring-field_lane
3 new_berry place
If the address column contains spring-field_
just replace it with spring-field
.如果地址列包含
spring-field_
只需将其替换为spring-field
。
Expected result:预期结果:
id address
1 spring-field
2 spring-field
3 new_berry place
Tried:试过:
df = df.withColumn('address',F.regexp_replace(F.col('address'), 'spring-field_*', 'spring-field'))
Seems not working.似乎不起作用。
You can use like
with when
expression:您可以使用
like
with when
表达式:
from pyspark.sql import functions as F
df = df.withColumn(
'address',
F.when(
F.col('address').like('%spring-field_%'),
F.lit('spring-field')
).otherwise(F.col('address'))
)
You can use the following regex:您可以使用以下正则表达式:
df.withColumn(
'address',
F.regexp_replace('address', r'.*spring-field.*', 'spring-field')
)
Alternatively you can use the method contains
:或者,您可以使用方法
contains
:
df.withColumn(
'address',
F.when(
F.col('address').contains("spring-field"), "spring-field"
).otherwise(F.col('address'))
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.