在不更改旧架构的情况下在pyspark中读取json文件

Question

I received the json every day with 10 attributes but some days if any attribute has no value they will send the 9 attributes and 10th attribute has not there in json. 我每天都会收到带有10个属性的json，但是几天后，如果任何属性没有值，它们将发送9个属性，而第十个属性不在json中。 How can I read the json file in pyspark without changing old table schema 我如何在不更改旧表架构的情况下读取pyspark中的json文件

Answer 1

It seems like you should enforce a schema when reading the files. 似乎在读取文件时应强制执行schema 。 I'm assuming you have something like this: 我假设你有这样的事情：

df = spark.read.json(path_to_json_files)

In order to preserve all the attributes/fields, use the schema like so: 为了保留所有属性/字段，请使用如下模式：

df = spark.read.schema(file_schema).json(path_to_json_files)

To get the file_schema you can use an old file(s) that you know every attribute is available: 要获取file_schema ，可以使用一个旧文件，您知道每个属性都可用：

file_schema = spark.read.json(full_json_file).schema

在不更改旧架构的情况下在pyspark中读取json文件

问题描述

1 个解决方案

解决方案1
1 2019-05-27 20:16:24

在不更改旧架构的情况下在pyspark中读取json文件

问题描述

1 个解决方案

解决方案1 1 2019-05-27 20:16:24

解决方案1
1 2019-05-27 20:16:24