简体   繁体   English

在 pyspark dataframe 中用双引号替换单引号

[英]replace single quotes with double quotes in pyspark dataframe

from the below code I am writing a dataframe to csv file.从下面的代码我正在写一个 dataframe 到 csv 文件。

As my dataframe contains "" for None , I have added replace("", None) because Null values are supposed to be represented as None instead of "" (double quotes)由于我的 dataframe 包含""None ,我添加了replace("", None)因为Null值应该表示为None而不是"" (双引号)

newDf.coalesce(1).replace("", None).replace("'", "\"").write.format('csv').option('nullValue', None).option('header', 'true').option('delimiter', '|').mode('overwrite').save(destination_csv)

I tried adding .replace("'", "\""). but it doesn't work我尝试添加.replace("'", "\"").但它不起作用

the data also contains data with single quotes数据还包含带单引号的数据

eg:例如:

Survey No. 123, 'Anjanadhri Godowns', CityName

I need to replace the single quotes from the dataframe and replace it with double-quotes.我需要将 dataframe 中的单引号替换为双引号。

How can it be achieved?如何实现?

You can use regexp_replace to replace single quotes with double quotes in all columns before writing the output:在编写 output 之前,您可以使用regexp_replace将所有列中的单引号替换为双引号:

import pyspark.sql.functions as F

df2 = df.select([F.regexp_replace(c, "'", '"').alias(c) for c in df.columns])

# then write output
# df2.coalesce(1).write(...)

Using translate使用translate

from pyspark.sql.functions import *

data_list = [(1, "'Name 1'"), (2, "'Name 2' and 'Something'")]
df = spark.createDataFrame(data = data_list, schema = ["ID", "my_col"])
# +---+--------------------+
# | ID|              my_col|
# +---+--------------------+
# |  1|            'Name 1'|
# |  2|'Name 2' and 'Som...|
# +---+--------------------+

df.withColumn('my_col', translate('my_col', "'", '"')).show()
# +---+--------------------+
# | ID|              my_col|
# +---+--------------------+
# |  1|            "Name 1"|
# |  2|"Name 2" and "Som...|
# +---+--------------------+

This will replace all occurrences of the single quote character with a double quotation mark in the column my_col .这将用my_col列中的双引号替换所有出现的单引号字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM