Using Pyspark how to read JSON file and create schema

Question

I have a JSON file as below format. How to read it and create schema for this using PYSPARK function-

{
        "Entry": {
                "DataType": "Integer",
                "Length": "7",
                "Required": "True",
                "Description": "Enrty"
        },
        "Per": {
                "DataType": "String",
                "Length": "2",
                "Required": "True",
                "Description": "Per"
        }
}

Answer 1

You can do the following to get the schema from the json file you have

from pyspark.sql import types as t
def getDataType(DataType):
    if DataType == 'Float':
        return t.FloatType()
    elif DataType == 'Integer':
        return t.IntegerType()
    elif DataType == 'Date':
        return t.DateType()
    elif DataType == 'Double':
        return t.DoubleType()
    else:
        return t.StringType()

def getNullable(Required):
    if Required == 'True':
        return True
    else:
        return False

df = spark.read.option('multiline', True).json('path to json file')
schema = t.StructType([t.StructField(x['Description'], getDataType(x['DataType']), getNullable(x['Required'])) for x in df.rdd.first()])

so the schema should be

StructType(List(StructField(Enrty,IntegerType,true),StructField(Per,StringType,true)))

Using Pyspark how to read JSON file and create schema

Question

1 answers

solution1
0 ACCPTED 2018-06-18 04:37:29

Using Pyspark how to read JSON file and create schema

Question

1 answers

solution1 0 ACCPTED 2018-06-18 04:37:29

solution1
0 ACCPTED 2018-06-18 04:37:29