[英]replace single quotes with double quotes in pyspark dataframe
from the below code I am writing a dataframe to csv file.从下面的代码我正在写一个 dataframe 到 csv 文件。
As my dataframe contains ""
for None
, I have added replace("", None)
because Null
values are supposed to be represented as None
instead of ""
(double quotes)由于我的 dataframe 包含""
为None
,我添加了replace("", None)
因为Null
值应该表示为None
而不是""
(双引号)
newDf.coalesce(1).replace("", None).replace("'", "\"").write.format('csv').option('nullValue', None).option('header', 'true').option('delimiter', '|').mode('overwrite').save(destination_csv)
I tried adding .replace("'", "\"").
but it doesn't work我尝试添加.replace("'", "\"").
但它不起作用
the data also contains data with single quotes数据还包含带单引号的数据
eg:例如:
Survey No. 123, 'Anjanadhri Godowns', CityName
I need to replace the single quotes from the dataframe and replace it with double-quotes.我需要将 dataframe 中的单引号替换为双引号。
How can it be achieved?如何实现?
You can use regexp_replace
to replace single quotes with double quotes in all columns before writing the output:在编写 output 之前,您可以使用regexp_replace
将所有列中的单引号替换为双引号:
import pyspark.sql.functions as F
df2 = df.select([F.regexp_replace(c, "'", '"').alias(c) for c in df.columns])
# then write output
# df2.coalesce(1).write(...)
from pyspark.sql.functions import *
data_list = [(1, "'Name 1'"), (2, "'Name 2' and 'Something'")]
df = spark.createDataFrame(data = data_list, schema = ["ID", "my_col"])
# +---+--------------------+
# | ID| my_col|
# +---+--------------------+
# | 1| 'Name 1'|
# | 2|'Name 2' and 'Som...|
# +---+--------------------+
df.withColumn('my_col', translate('my_col', "'", '"')).show()
# +---+--------------------+
# | ID| my_col|
# +---+--------------------+
# | 1| "Name 1"|
# | 2|"Name 2" and "Som...|
# +---+--------------------+
This will replace all occurrences of the single quote character with a double quotation mark in the column my_col
.这将用my_col
列中的双引号替换所有出现的单引号字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.