简体   繁体   English

如何从 python 的 spark df 中的字符串中删除特定字符(“)?

[英]How to remove specific character (") from string in spark df in python?

I have the following spark data frame我有以下spark data frame

   description

"""Piotr is ""running"
"""Leo is ""running"
"""Marta is ""running"

I want the following sparkdf我想要以下sparkdf

   description

"Piotr is ""running
"""Leo is ""running"
"""Marta is ""running"

Logic is if Piotr is in the string the first two " and last " will be removed.逻辑是如果Piotr在字符串中,前两个"和 last "将被删除。

You can apply a conditional substr , for the conditional clause you can use when .您可以应用条件substr ,对于可以使用的条件子句when

from pyspark.sql import functions as F

data = [('"""Piotr is ""running"', ), ('"""Leo is ""running"',), ('"""Marta is ""running"', )]


df = spark.createDataFrame(data, ("description", ))

df.withColumn("description", \
              F.when(F.col("description").contains("Piotr"), \
                     F.col("description").substr(F.lit(3), (F.length(F.col("description")) - F.lit(3))))\
              .otherwise(F.col("description")))\
  .show(200, False)

Output Output

+----------------------+
|description           |
+----------------------+
|"Piotr is ""running   |
|"""Leo is ""running"  |
|"""Marta is ""running"|
+----------------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM