[英]How to remove specific character (") from string in spark df in python?
我有以下spark data frame
description
"""Piotr is ""running"
"""Leo is ""running"
"""Marta is ""running"
我想要以下sparkdf
description
"Piotr is ""running
"""Leo is ""running"
"""Marta is ""running"
逻辑是如果Piotr
在字符串中,前两个"
和 last "
将被删除。
您可以应用条件substr
,对于可以使用的条件子句when
。
from pyspark.sql import functions as F
data = [('"""Piotr is ""running"', ), ('"""Leo is ""running"',), ('"""Marta is ""running"', )]
df = spark.createDataFrame(data, ("description", ))
df.withColumn("description", \
F.when(F.col("description").contains("Piotr"), \
F.col("description").substr(F.lit(3), (F.length(F.col("description")) - F.lit(3))))\
.otherwise(F.col("description")))\
.show(200, False)
+----------------------+
|description |
+----------------------+
|"Piotr is ""running |
|"""Leo is ""running" |
|"""Marta is ""running"|
+----------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.