How to not print out Quote characters when creating a CSV file in Python

Question

I have a CSV file that I'm creating in Azure Databrick using Python. This is taking a data frame and generating a CSV file from it. The problem is when there is an empty value in the data frame the output is 2 double quotes, ie "",

Example Output

L1Code  L1 Desc1    L1 Desc2    L1 Desc3    L2Code
Beverage    Beverage    ""  ""  Drink Blends

This is the code that I'm using to generate the file, where df is a Pandas dataframe that has already been created.

from pyspark.sql import SQLContext

def createCsvFile(data, rootPath, filePath):
  data.coalesce(1).write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").option("delimiter", "\t").option("quoteMode", "NONE").csv(rootPath + filePath + ".tmp")

  fileList = dbutils.fs.ls(rootPath + filePath + ".tmp/")

  for file in fileList:
    if file.name.endswith("csv"):
      filename = file.path
      dbutils.fs.cp(filename, rootPath + filePath + ".txt")

  dbutils.fs.rm(rootPath + filePath + ".tmp", recurse=True)


sqlCtx = SQLContext(sc)
data = sqlCtx.createDataFrame(df)
createCsvFile(data, '/mnt/adlsdata/Raw/Astute/', 'products')

Answer 1

我最终需要使用emptyValue选项来使ti工作

  data.coalesce(1).write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").option("delimiter", "\t").option("quoteMode", "NONE").option("quote", u'\u0000').option("nullValue", "").option("emptyValue", "").csv(rootPath + filePath + ".tmp")

How to not print out Quote characters when creating a CSV file in Python

Question

1 answers

solution1
0 2019-07-11 13:31:58

How to not print out Quote characters when creating a CSV file in Python

Question

1 answers

solution1 0 2019-07-11 13:31:58

solution1
0 2019-07-11 13:31:58