简体   繁体   English

使用Spark自定义JSON输出格式

[英]Custom Formatting of JSON output using Spark

I have a dataset with a bunch of BigDecimal values. 我有一个带有一堆BigDecimal值的数据集。 I would like to output these records to a JSON file, but when I do the BigDecimal values will often be written with trailing zeros ( 123.4000000000000 ), but the spec we are must conform to does not allow this (for reasons I don't understand). 我想将这些记录输出到JSON文件,但是当我这样做时,BigDecimal值通常会写有尾随零( 123.4000000000000 ),但是我们必须遵守的规范不允许这样做(出于我不理解的原因) )。

I am trying to see if there is a way to override how the data is printed to JSON. 我正在尝试查看是否有一种方法可以覆盖如何将数据打印到JSON。 Currently, my best idea is to convert each record to a string using JACKSON and then writing the data using df.write().text(..) rather than JSON. 目前,我最好的想法是使用JACKSON将每个记录转换为字符串,然后使用df.write().text(..)而不是JSON写入数据。

I suggest to convert Decimal type to String before writing to JSON. 我建议在写入JSON之前将Decimal类型转换为String。

Below code is in Scala, but you can use it in Java easily 下面的代码在Scala中,但是您可以在Java中轻松使用它

import org.apache.spark.sql.types.StringType

# COLUMN_NAME is your DataFrame column name.

val new_df = df.withColumn('COLUMN_NAME_TMP', df.COLUMN_NAME.cast(StringType)).drop('COLUMN_NAME').withColumnRenamed('COLUMN_NAME_TMP', 'COLUMN_NAME')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM