简体   繁体   中英

Pyspark, how to write a df with comma as a decimal separator

This is my function I use to write files:

#pyspark
def write_file(dataframe=None, dest_dir=None, filename=None):
    import os
    temp_dir = dest_dir + '/tmp/'
    dataframe.coalesce(1) \
        .write \
        .format('com.databricks.spark.csv') \
        .mode('overwrite') \
        .option('header', True) \
        .option("emptyValue", None)\
        .option('nullValue', None) \
        .option('delimiter', ';') \
        .option('DataFormat', 'dd-MMM-yyyy') \
        .option('encoding', 'UTF-8') \
        .save(temp_dir)

I need to tweak it so that it replaces the dot decimal separator with the comma. When I open csv/txt files spooled with this on Excel it considers, for istance, 1.000000 as a million instead of 1. These fields have format decimal(38,12).

Just replace the dot with comma for that column -

df = df.withColumn('some_col', F.regexp_replace('some_col', ',', '.').cast('float'))

now, write as is

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM