简体   繁体   English

在Python中创建CSV文件时如何不打印引号字符

[英]How to not print out Quote characters when creating a CSV file in Python

I have a CSV file that I'm creating in Azure Databrick using Python. 我有一个使用Python在Azure Databrick中创建的CSV文件。 This is taking a data frame and generating a CSV file from it. 这是在获取数据帧并从中生成CSV文件。 The problem is when there is an empty value in the data frame the output is 2 double quotes, ie "", 问题是,当数据帧中有一个空值时,输出为2个双引号,即“”,

Example Output 示例输出

L1Code  L1 Desc1    L1 Desc2    L1 Desc3    L2Code
Beverage    Beverage    ""  ""  Drink Blends

This is the code that I'm using to generate the file, where df is a Pandas dataframe that has already been created. 这是我用来生成文件的代码,其中df是已经创建的Pandas数据框。

from pyspark.sql import SQLContext

def createCsvFile(data, rootPath, filePath):
  data.coalesce(1).write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").option("delimiter", "\t").option("quoteMode", "NONE").csv(rootPath + filePath + ".tmp")

  fileList = dbutils.fs.ls(rootPath + filePath + ".tmp/")

  for file in fileList:
    if file.name.endswith("csv"):
      filename = file.path
      dbutils.fs.cp(filename, rootPath + filePath + ".txt")

  dbutils.fs.rm(rootPath + filePath + ".tmp", recurse=True)


sqlCtx = SQLContext(sc)
data = sqlCtx.createDataFrame(df)
createCsvFile(data, '/mnt/adlsdata/Raw/Astute/', 'products')

我最终需要使用emptyValue选项来使ti工作

  data.coalesce(1).write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").option("delimiter", "\t").option("quoteMode", "NONE").option("quote", u'\u0000').option("nullValue", "").option("emptyValue", "").csv(rootPath + filePath + ".tmp")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM