I have a JSON file as below format. How to read it and create schema for this using PYSPARK function-
{
"Entry": {
"DataType": "Integer",
"Length": "7",
"Required": "True",
"Description": "Enrty"
},
"Per": {
"DataType": "String",
"Length": "2",
"Required": "True",
"Description": "Per"
}
}
You can do the following to get the schema
from the json
file you have
from pyspark.sql import types as t
def getDataType(DataType):
if DataType == 'Float':
return t.FloatType()
elif DataType == 'Integer':
return t.IntegerType()
elif DataType == 'Date':
return t.DateType()
elif DataType == 'Double':
return t.DoubleType()
else:
return t.StringType()
def getNullable(Required):
if Required == 'True':
return True
else:
return False
df = spark.read.option('multiline', True).json('path to json file')
schema = t.StructType([t.StructField(x['Description'], getDataType(x['DataType']), getNullable(x['Required'])) for x in df.rdd.first()])
so the schema
should be
StructType(List(StructField(Enrty,IntegerType,true),StructField(Per,StringType,true)))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.