繁体   English   中英

如何从 python 的 spark df 中的字符串中删除特定字符(“)?

[英]How to remove specific character (") from string in spark df in python?

我有以下spark data frame

   description

"""Piotr is ""running"
"""Leo is ""running"
"""Marta is ""running"

我想要以下sparkdf

   description

"Piotr is ""running
"""Leo is ""running"
"""Marta is ""running"

逻辑是如果Piotr在字符串中,前两个"和 last "将被删除。

您可以应用条件substr ,对于可以使用的条件子句when

from pyspark.sql import functions as F

data = [('"""Piotr is ""running"', ), ('"""Leo is ""running"',), ('"""Marta is ""running"', )]


df = spark.createDataFrame(data, ("description", ))

df.withColumn("description", \
              F.when(F.col("description").contains("Piotr"), \
                     F.col("description").substr(F.lit(3), (F.length(F.col("description")) - F.lit(3))))\
              .otherwise(F.col("description")))\
  .show(200, False)

Output

+----------------------+
|description           |
+----------------------+
|"Piotr is ""running   |
|"""Leo is ""running"  |
|"""Marta is ""running"|
+----------------------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM