[英]write a spark Dataset to json with all keys in the schema, including null columns
I am writing a dataset to json using:我正在使用以下方法将数据集写入 json:
ds.coalesce(1).write.format("json").option("nullValue",null).save("project/src/test/resources")
For records that have columns with null values, the json document does not write that key at all.对于具有 null 值的列的记录,json 文档根本不会写入该键。
Is there a way to enforce null value keys to the json output?有没有办法对 json output 强制执行 null 值键?
This is needed since I use this json to read it onto another dataset (in a test case) and cannot enforce a schema if some documents do not have all the keys in the case class (I am reading it by putting the json file under resources folder and transforming to a dataset via RDD[String], as explained here: https://databaseline.bitbucket.io/a-quickie-on-reading-json-resource-files-in-apache-spark/ )这是必需的,因为我使用此 json 将其读取到另一个数据集(在测试用例中)并且如果某些文档没有 class 案例中的所有键(我正在通过将 Z466DEEC76ECDF5FCA6DDDD5D5D5D571F6324 文件放在资源下阅读它)文件夹并通过 RDD[String] 转换为数据集,如下所述: https://databaseline.bitbucket.io/a-quickie-on-reading-json-resource-files-in-apache-spark/ )
I agree with @philantrovert.我同意@philantrovert。
ds.na.fill("")
.coalesce(1)
.write
.format("json")
.save("project/src/test/resources")
Since DataSets
are immutable you are not altering the data in ds
and you can process it (complete with null values and all) in any following code.由于DataSets
是不可变的,因此您不会更改ds
的数据,您可以在以下任何代码中处理它(包括空值和全部)。 You are simply replacing null values with an empty string in the saved file.您只是在保存的文件中用空字符串替换空值。
Since Pyspark 3, one can use the ignoreNullFields
option when writing to a JSON file.从 Pyspark 3 开始,可以在写入 JSON 文件时使用ignoreNullFields
选项。
spark_dataframe.write.json(output_path,ignoreNullFields=False)
Pyspark docs: https://spark.apache.org/docs/3.1.1/api/python/_modules/pyspark/sql/readwriter.html#DataFrameWriter.json Pyspark docs: https://spark.apache.org/docs/3.1.1/api/python/_modules/pyspark/sql/readwriter.html#DataFrameWriter.json
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.