This is my function I use to write files:
#pyspark
def write_file(dataframe=None, dest_dir=None, filename=None):
import os
temp_dir = dest_dir + '/tmp/'
dataframe.coalesce(1) \
.write \
.format('com.databricks.spark.csv') \
.mode('overwrite') \
.option('header', True) \
.option("emptyValue", None)\
.option('nullValue', None) \
.option('delimiter', ';') \
.option('DataFormat', 'dd-MMM-yyyy') \
.option('encoding', 'UTF-8') \
.save(temp_dir)
I need to tweak it so that it replaces the dot decimal separator with the comma. When I open csv/txt files spooled with this on Excel it considers, for istance, 1.000000 as a million instead of 1. These fields have format decimal(38,12).
Just replace the dot with comma for that column -
df = df.withColumn('some_col', F.regexp_replace('some_col', ',', '.').cast('float'))
now, write as is
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.