I have a CSV file that I'm creating in Azure Databrick using Python. This is taking a data frame and generating a CSV file from it. The problem is when there is an empty value in the data frame the output is 2 double quotes, ie "",
Example Output
L1Code L1 Desc1 L1 Desc2 L1 Desc3 L2Code
Beverage Beverage "" "" Drink Blends
This is the code that I'm using to generate the file, where df is a Pandas dataframe that has already been created.
from pyspark.sql import SQLContext
def createCsvFile(data, rootPath, filePath):
data.coalesce(1).write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").option("delimiter", "\t").option("quoteMode", "NONE").csv(rootPath + filePath + ".tmp")
fileList = dbutils.fs.ls(rootPath + filePath + ".tmp/")
for file in fileList:
if file.name.endswith("csv"):
filename = file.path
dbutils.fs.cp(filename, rootPath + filePath + ".txt")
dbutils.fs.rm(rootPath + filePath + ".tmp", recurse=True)
sqlCtx = SQLContext(sc)
data = sqlCtx.createDataFrame(df)
createCsvFile(data, '/mnt/adlsdata/Raw/Astute/', 'products')
我最终需要使用emptyValue选项来使ti工作
data.coalesce(1).write.mode("overwrite").format("com.databricks.spark.csv").option("header", "true").option("delimiter", "\t").option("quoteMode", "NONE").option("quote", u'\u0000').option("nullValue", "").option("emptyValue", "").csv(rootPath + filePath + ".tmp")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.