如果字符串在 PySpark 中包含某些 substring，则替换字符串

Question

Need to update a PySpark dataframe if the column contains the certain substring如果该列包含某些 substring，则需要更新 PySpark dataframe

for example:例如：

df looks like df 看起来像

id      address
1       spring-field_garden
2       spring-field_lane
3       new_berry place

If the address column contains spring-field_ just replace it with spring-field .如果地址列包含spring-field_只需将其替换为spring-field 。

Expected result:预期结果：

id      address
1       spring-field
2       spring-field
3       new_berry place

Tried:试过：

df = df.withColumn('address',F.regexp_replace(F.col('address'), 'spring-field_*', 'spring-field'))

Seems not working.似乎不起作用。

Answer 1

You can use like with when expression:您可以使用like with when表达式：

from pyspark.sql import functions as F

df = df.withColumn(
    'address',
    F.when(
        F.col('address').like('%spring-field_%'),
        F.lit('spring-field')
    ).otherwise(F.col('address'))
)

Answer 2

You can use the following regex:您可以使用以下正则表达式：

df.withColumn(
    'address',
    F.regexp_replace('address', r'.*spring-field.*', 'spring-field')
)

Alternatively you can use the method contains :或者，您可以使用方法contains ：

df.withColumn(
    'address',
    F.when(
        F.col('address').contains("spring-field"), "spring-field"
    ).otherwise(F.col('address'))
)

如果字符串在 PySpark 中包含某些 substring，则替换字符串

问题描述

2 个解决方案

解决方案1
2 2021-02-18 00:58:24

解决方案2
0 2021-02-18 07:49:06

如果字符串在 PySpark 中包含某些 substring，则替换字符串

问题描述

2 个解决方案

解决方案1 2 2021-02-18 00:58:24

解决方案2 0 2021-02-18 07:49:06

解决方案1
2 2021-02-18 00:58:24

解决方案2
0 2021-02-18 07:49:06