如何从 python 的 spark df 中的字符串中删除特定字符（“）？

Question

我有以下spark data frame

   description

"""Piotr is ""running"
"""Leo is ""running"
"""Marta is ""running"

我想要以下sparkdf

   description

"Piotr is ""running
"""Leo is ""running"
"""Marta is ""running"

逻辑是如果Piotr在字符串中，前两个"和 last "将被删除。

Answer 1

您可以应用条件substr ，对于可以使用的条件子句when 。

from pyspark.sql import functions as F

data = [('"""Piotr is ""running"', ), ('"""Leo is ""running"',), ('"""Marta is ""running"', )]


df = spark.createDataFrame(data, ("description", ))

df.withColumn("description", \
              F.when(F.col("description").contains("Piotr"), \
                     F.col("description").substr(F.lit(3), (F.length(F.col("description")) - F.lit(3))))\
              .otherwise(F.col("description")))\
  .show(200, False)

Output

+----------------------+
|description           |
+----------------------+
|"Piotr is ""running   |
|"""Leo is ""running"  |
|"""Marta is ""running"|
+----------------------+

如何从 python 的 spark df 中的字符串中删除特定字符（“）？

问题描述

1 个解决方案

解决方案1
0 2021-11-23 18:10:21

Output

如何从 python 的 spark df 中的字符串中删除特定字符（“）？

问题描述

1 个解决方案

解决方案1 0 2021-11-23 18:10:21

Output

解决方案1
0 2021-11-23 18:10:21