使用Spark自定义JSON输出格式

Question

I have a dataset with a bunch of BigDecimal values. 我有一个带有一堆BigDecimal值的数据集。 I would like to output these records to a JSON file, but when I do the BigDecimal values will often be written with trailing zeros ( 123.4000000000000 ), but the spec we are must conform to does not allow this (for reasons I don't understand). 我想将这些记录输出到JSON文件，但是当我这样做时，BigDecimal值通常会写有尾随零（ 123.4000000000000 ），但是我们必须遵守的规范不允许这样做（出于我不理解的原因））。

I am trying to see if there is a way to override how the data is printed to JSON. 我正在尝试查看是否有一种方法可以覆盖如何将数据打印到JSON。 Currently, my best idea is to convert each record to a string using JACKSON and then writing the data using df.write().text(..) rather than JSON. 目前，我最好的想法是使用JACKSON将每个记录转换为字符串，然后使用df.write().text(..)而不是JSON写入数据。

Answer 1

I suggest to convert Decimal type to String before writing to JSON. 我建议在写入JSON之前将Decimal类型转换为String。

Below code is in Scala, but you can use it in Java easily 下面的代码在Scala中，但是您可以在Java中轻松使用它

import org.apache.spark.sql.types.StringType

# COLUMN_NAME is your DataFrame column name.

val new_df = df.withColumn('COLUMN_NAME_TMP', df.COLUMN_NAME.cast(StringType)).drop('COLUMN_NAME').withColumnRenamed('COLUMN_NAME_TMP', 'COLUMN_NAME')

使用Spark自定义JSON输出格式

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-02-11 17:14:46

使用Spark自定义JSON输出格式

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-02-11 17:14:46

解决方案1
1 已采纳 2019-02-11 17:14:46