Reading json file in pyspark with out changing old schema

Question

I received the json every day with 10 attributes but some days if any attribute has no value they will send the 9 attributes and 10th attribute has not there in json. How can I read the json file in pyspark without changing old table schema

Answer 1

It seems like you should enforce a schema when reading the files. I'm assuming you have something like this:

df = spark.read.json(path_to_json_files)

In order to preserve all the attributes/fields, use the schema like so:

df = spark.read.schema(file_schema).json(path_to_json_files)

To get the file_schema you can use an old file(s) that you know every attribute is available:

file_schema = spark.read.json(full_json_file).schema

Reading json file in pyspark with out changing old schema

Question

1 answers

solution1
1 2019-05-27 20:16:24

Reading json file in pyspark with out changing old schema

Question

1 answers

solution1 1 2019-05-27 20:16:24

solution1
1 2019-05-27 20:16:24