如何从 python 的 spark df 中的字符串中删除特定字符（“）？

Question

I have the following spark data frame我有以下spark data frame

   description

"""Piotr is ""running"
"""Leo is ""running"
"""Marta is ""running"

I want the following sparkdf我想要以下sparkdf

   description

"Piotr is ""running
"""Leo is ""running"
"""Marta is ""running"

Logic is if Piotr is in the string the first two " and last " will be removed.逻辑是如果Piotr在字符串中，前两个"和 last "将被删除。

Answer 1

You can apply a conditional substr , for the conditional clause you can use when .您可以应用条件substr ，对于可以使用的条件子句when 。

from pyspark.sql import functions as F

data = [('"""Piotr is ""running"', ), ('"""Leo is ""running"',), ('"""Marta is ""running"', )]


df = spark.createDataFrame(data, ("description", ))

df.withColumn("description", \
              F.when(F.col("description").contains("Piotr"), \
                     F.col("description").substr(F.lit(3), (F.length(F.col("description")) - F.lit(3))))\
              .otherwise(F.col("description")))\
  .show(200, False)

Output Output

+----------------------+
|description           |
+----------------------+
|"Piotr is ""running   |
|"""Leo is ""running"  |
|"""Marta is ""running"|
+----------------------+

如何从 python 的 spark df 中的字符串中删除特定字符（“）？

问题描述

1 个解决方案

解决方案1
0 2021-11-23 18:10:21

Output Output

如何从 python 的 spark df 中的字符串中删除特定字符（“）？

问题描述

1 个解决方案

解决方案1 0 2021-11-23 18:10:21

Output Output

解决方案1
0 2021-11-23 18:10:21