简体   繁体   English

如果字符串在 PySpark 中包含某些 substring,则替换字符串

[英]Replace string if it contains certain substring in PySpark

Need to update a PySpark dataframe if the column contains the certain substring如果该列包含某些 substring,则需要更新 PySpark dataframe

for example:例如:

df looks like df 看起来像

id      address
1       spring-field_garden
2       spring-field_lane
3       new_berry place

If the address column contains spring-field_ just replace it with spring-field .如果地址列包含spring-field_只需将其替换为spring-field

Expected result:预期结果:

id      address
1       spring-field
2       spring-field
3       new_berry place

Tried:试过:

df = df.withColumn('address',F.regexp_replace(F.col('address'), 'spring-field_*', 'spring-field'))

Seems not working.似乎不起作用。

You can use like with when expression:您可以使用like with when表达式:

from pyspark.sql import functions as F

df = df.withColumn(
    'address',
    F.when(
        F.col('address').like('%spring-field_%'),
        F.lit('spring-field')
    ).otherwise(F.col('address'))
)

You can use the following regex:您可以使用以下正则表达式:

df.withColumn(
    'address',
    F.regexp_replace('address', r'.*spring-field.*', 'spring-field')
)

Alternatively you can use the method contains :或者,您可以使用方法contains

df.withColumn(
    'address',
    F.when(
        F.col('address').contains("spring-field"), "spring-field"
    ).otherwise(F.col('address'))
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果字符串包含某些子字符串,则按条件过滤 - Filter by condition if string contains certain substring 如果它包含熊猫中的子字符串,则替换整个字符串 - Replace whole string if it contains substring in pandas 如果熊猫数据框中包含特定的子字符串,请替换该字符串 - Replace string in pandas dataframe if it contains specific substring 如果整个字符串包含熊猫数据框中的子字符串,则替换整个字符串 - Replace whole string if it contains substring in pandas dataframe 如果在 Pandas 数据框中包含子字符串,则替换整个字符串,但包含值列表 - Replace Whole String if it contains substring in pandas dataframe, but with a list of values 替换两个字符串之间的字符串,除非它包含子字符串 - Replace string between two strings unless it contains a substring 替换包含大熊猫整个数据框中子字符串的整个字符串 - Replace whole string which contains substring in whole dataframe in pandas 如果列表中的字符串在 Pandas DataFrame 列中包含 ZE83AED3DDF4667DEC0DAAAACB2BB3BE0BZ,如何替换它 - How to replace a string in a list if it contains a substring in Pandas DataFrame column 替换Pyspark中数据框中的值的子字符串 - Replace SubString of values in a dataframe in Pyspark 查看字符串是否包含子字符串 - See if string contains substring
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM