I wanted to create on DataFrame with a specified schema in Python. Here is the process that i have done so far.
I have Sample.parm file, where i have defined schema like as below: Account_type,string,True
I have written python script sample.py to read sample.parm file,generate the schema based on sample.parm file and then generate dataframe based on user defined schema.
d
def schema():
with open('<path>/sample.parm','r') as parm_file:
reader=csv.reader(parm_file,delimiter=",")
filteredSchema = []
for fieldName in reader:
if fieldName[1].lower() == "decimal":
filteredSchema.append([fieldName[0], DecimalType(),fieldName[2]])
elif fieldName[1].lower() == "string":
filteredSchema.append([fieldName[0], StringType(),fieldName[2]])
elif fieldName[1].lower() == "integer":
filteredSchema.append([fieldName[0], IntegerType(),fieldName[2]])
elif fieldName[1].lower() == "date":
filteredSchema.append([fieldName[0], DateType(),fieldName[2]])
elif fieldName[1].lower() == "byte":
filteredSchema.append([fieldName[0], ByteType(),fieldName[2]])
elif fieldName[1].lower() == "boolean":
filteredSchema.append([fieldName[0], BooleanType(),fieldName[2]])
elif fieldName[1].lower() == "short":
filteredSchema.append([fieldName[0], ShortType(),fieldName[2]])
elif fieldName[1].lower() == "long":
filteredSchema.append([fieldName[0], LongType(),fieldName[2]])
elif fieldName[1].lower() == "double":
filteredSchema.append([fieldName[0], DoubleType(),fieldName[2]])
elif fieldName[1].lower() == "float":
filteredSchema.append([fieldName[0], FloatType(),fieldName[2]])
elif fieldName[1].lower() == "timestamp":
filteredSchema.append([fieldName[0], TimestampType(),fieldName[2]])
struct_schema = [StructField(line[0], line[1], line[2]) for line in filteredSchema]
schema=StructTpe(struct_schema)
return schema
def create_dataframe(path):
val=spark.read.schema(schema()).csv(path, sep='\t')
print(val.take(1))
but getting error like : pyspark.sql.utils.IllegalArgumentException: u'Failed to convert the JSON string \\'{"metadata":{},"name":"account_type","nullable":"True","type":"string"}\\' to a field.'
can you please anyone help me to figure it out? appreciate your help
I think JSON build is not correct- the metadata is empty,"type" and "field" are missing. Please try the following JSON for your schema.
{"type":"struct","fields":[{"name":"account_type","type":"string","nullable":true,"metadata":{"name":"account_type","scale":0}}]}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.