在 pyspark dataframe 中用双引号替换单引号

Question

from the below code I am writing a dataframe to csv file.从下面的代码我正在写一个 dataframe 到 csv 文件。

As my dataframe contains "" for None , I have added replace("", None) because Null values are supposed to be represented as None instead of "" (double quotes)由于我的 dataframe 包含""为None ，我添加了replace("", None)因为Null值应该表示为None而不是"" （双引号）

newDf.coalesce(1).replace("", None).replace("'", "\"").write.format('csv').option('nullValue', None).option('header', 'true').option('delimiter', '|').mode('overwrite').save(destination_csv)

I tried adding .replace("'", "\""). but it doesn't work我尝试添加.replace("'", "\"").但它不起作用

the data also contains data with single quotes数据还包含带单引号的数据

eg:例如：

Survey No. 123, 'Anjanadhri Godowns', CityName

I need to replace the single quotes from the dataframe and replace it with double-quotes.我需要将 dataframe 中的单引号替换为双引号。

How can it be achieved?如何实现？

Answer 1

You can use regexp_replace to replace single quotes with double quotes in all columns before writing the output:在编写 output 之前，您可以使用regexp_replace将所有列中的单引号替换为双引号：

import pyspark.sql.functions as F

df2 = df.select([F.regexp_replace(c, "'", '"').alias(c) for c in df.columns])

# then write output
# df2.coalesce(1).write(...)

Answer 2

Using translate使用translate

from pyspark.sql.functions import *

data_list = [(1, "'Name 1'"), (2, "'Name 2' and 'Something'")]
df = spark.createDataFrame(data = data_list, schema = ["ID", "my_col"])
# +---+--------------------+
# | ID|              my_col|
# +---+--------------------+
# |  1|            'Name 1'|
# |  2|'Name 2' and 'Som...|
# +---+--------------------+

df.withColumn('my_col', translate('my_col', "'", '"')).show()
# +---+--------------------+
# | ID|              my_col|
# +---+--------------------+
# |  1|            "Name 1"|
# |  2|"Name 2" and "Som...|
# +---+--------------------+

This will replace all occurrences of the single quote character with a double quotation mark in the column my_col .这将用my_col列中的双引号替换所有出现的单引号字符。

在 pyspark dataframe 中用双引号替换单引号

问题描述

2 个解决方案

解决方案1
0 2021-03-29 10:34:41

解决方案2
0 2021-03-29 10:42:44

在 pyspark dataframe 中用双引号替换单引号

问题描述

2 个解决方案

解决方案1 0 2021-03-29 10:34:41

解决方案2 0 2021-03-29 10:42:44

解决方案1
0 2021-03-29 10:34:41

解决方案2
0 2021-03-29 10:42:44