[英]How to remove specific character (") from string in spark df in python?
I have the following spark data frame
我有以下
spark data frame
description
"""Piotr is ""running"
"""Leo is ""running"
"""Marta is ""running"
I want the following sparkdf
我想要以下
sparkdf
description
"Piotr is ""running
"""Leo is ""running"
"""Marta is ""running"
Logic is if Piotr
is in the string the first two "
and last "
will be removed.逻辑是如果
Piotr
在字符串中,前两个"
和 last "
将被删除。
You can apply a conditional substr
, for the conditional clause you can use when
.您可以应用条件
substr
,对于可以使用的条件子句when
。
from pyspark.sql import functions as F
data = [('"""Piotr is ""running"', ), ('"""Leo is ""running"',), ('"""Marta is ""running"', )]
df = spark.createDataFrame(data, ("description", ))
df.withColumn("description", \
F.when(F.col("description").contains("Piotr"), \
F.col("description").substr(F.lit(3), (F.length(F.col("description")) - F.lit(3))))\
.otherwise(F.col("description")))\
.show(200, False)
+----------------------+
|description |
+----------------------+
|"Piotr is ""running |
|"""Leo is ""running" |
|"""Marta is ""running"|
+----------------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.