Pyspark, how to write a df with comma as a decimal separator

Question

This is my function I use to write files:

#pyspark
def write_file(dataframe=None, dest_dir=None, filename=None):
    import os
    temp_dir = dest_dir + '/tmp/'
    dataframe.coalesce(1) \
        .write \
        .format('com.databricks.spark.csv') \
        .mode('overwrite') \
        .option('header', True) \
        .option("emptyValue", None)\
        .option('nullValue', None) \
        .option('delimiter', ';') \
        .option('DataFormat', 'dd-MMM-yyyy') \
        .option('encoding', 'UTF-8') \
        .save(temp_dir)

I need to tweak it so that it replaces the dot decimal separator with the comma. When I open csv/txt files spooled with this on Excel it considers, for istance, 1.000000 as a million instead of 1. These fields have format decimal(38,12).

Answer 1

Just replace the dot with comma for that column -

df = df.withColumn('some_col', F.regexp_replace('some_col', ',', '.').cast('float'))

now, write as is

Pyspark, how to write a df with comma as a decimal separator

Question

1 answers

solution1
1 ACCPTED 2021-07-16 12:43:47

Pyspark, how to write a df with comma as a decimal separator

Question

1 answers

solution1 1 ACCPTED 2021-07-16 12:43:47

solution1
1 ACCPTED 2021-07-16 12:43:47