简体   繁体   中英

How to convert a column of float numbers in brazilian currency in spark-sql/pyspark?

In spark-sql or pyspark I have to convert a float number in Brazilian currency. I'm doing:

data=[('bruce','wayne','1950-01-01','Male',9876543.21)]

columns=["NAME","LASTNAME","DOB","SEX","GOLD"]
df=spark.createDataFrame(data=data,schema=columns)

df= df \
.withColumn("GOLD_STRING",spf.concat(spf.lit("R$ "),
                                    spf.when(spf.substring(df.GOLD.cast("string"),-21,3)!="",spf.concat(spf.substring(df.GOLD.cast("string"),-21,3),spf.lit('.'))).otherwise(""),
                                    spf.when(spf.substring(df.GOLD.cast("string"),-18,3)!="",spf.concat(spf.substring(df.GOLD.cast("string"),-18,3),spf.lit('.'))).otherwise(""),
                                    spf.when(spf.substring(df.GOLD.cast("string"),-15,3)!="",spf.concat(spf.substring(df.GOLD.cast("string"),-15,3),spf.lit('.'))).otherwise(""),
                                    spf.when(spf.substring(df.GOLD.cast("string"),-12,3)!="",spf.concat(spf.substring(df.GOLD.cast("string"),-12,3),spf.lit('.'))).otherwise(""),
                                    spf.when(spf.substring(df.GOLD.cast("string"),-9,3)!="",spf.concat(spf.substring(df.GOLD.cast("string"),-9,3),spf.lit('.'))).otherwise(""),
                                    spf.when(spf.substring(df.GOLD.cast("string"),-6,3)!="",spf.concat(spf.substring(df.GOLD.cast("string"),-6,3),spf.lit(','))).otherwise(""),
                                    spf.when(spf.substring(df.GOLD.cast("string"),-2,2)!="",spf.concat(spf.substring(df.GOLD.cast("string"),-2,3))).otherwise("00")))
df.show()

And as result I'm getting:

+-----+--------+----------+----+----------+---------------+
| NAME|LASTNAME|       DOB| SEX|      GOLD|    GOLD_STRING|
+-----+--------+----------+----+----------+---------------+
|bruce|   wayne|1950-01-01|Male|9876543.21|R$ 9.876.543,21|
+-----+--------+----------+----+----------+---------------+

Is exactly what I need, but is there a more simple/performatic way? Thanks in advance! Any help will be greatly appreciated!

For number formatting, format_number function will print those number with '#,###,###.##' format. Though this still requires replacing the thousand and decimal separators using multiple replace s.

df = df.withColumn("GOLD_STRING", spf.expr("concat('R$ ', replace(replace(replace(format_number(GOLD, 2), '.', ';'), ',', '.'), ';', ','))"))
df.show()

+-----+--------+----------+----+----------+---------------+
| NAME|LASTNAME|       DOB| SEX|      GOLD|    GOLD_STRING|
+-----+--------+----------+----+----------+---------------+
|bruce|   wayne|1950-01-01|Male|9876543.21|R$ 9.876.543,21|
+-----+--------+----------+----+----------+---------------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM