[英]Pyspark get Schema from JSON file
I am trying to get Pyspark
schema from a JSON file but when I am creating the schema using the variable in the Python code, I am able to see the variable type of <class 'pyspark.sql.types.StructType'>
but when I am trying to get through JSON file it's showing type of unicode
. 我试图从JSON文件中获取
Pyspark
模式,但是当我使用Python代码中的变量创建模式时,我能够看到<class 'pyspark.sql.types.StructType'>
的变量类型,但是当我我试图通过JSON文件来显示unicode
的类型。
Is there any way to get pyspark
schema through JSON file? 有什么办法可以通过JSON文件获取
pyspark
模式?
JSON file Content: JSON文件内容:
{
"tediasessionclose_schema" : "StructType([ StructField('@timestamp', StringType()), StructField('message' , StructType([ StructField('componentAddress', StringType()), StructField('values', StructType([ StructField('confNum', StringType()), StructField('day', IntegerType())])"
}
Pyspark Code: Pyspark代码:
df = sc.read.json(hdfs_path, schema = jsonfile['tediasessionclose_schema'])
You can obtain the schema by evaluating the string that you get from reading the json: 您可以通过评估读取json所获得的字符串来获得模式:
import json
from pyspark.sql.types import StructField, StringType, IntegerType, StructType
with open('test.json') as f:
data = json.load(f)
df = sqlContext.createDataFrame([], schema = eval(data['tediasessionclose_schema']))
print(df.schema)
outputs: 输出:
StructType(List(StructField(@timestamp,StringType,true),StructField(message,StructType(List(StructField(componentAddress,StringType,true),StructField(values,StructType(List(StructField(confNum,StringType,true),StructField(day,IntegerType,true))),true))),true)))
where test.json
is: 其中
test.json
是:
{"tediasessionclose_schema" : "StructType([ StructField('@timestamp', StringType()), StructField('message' , StructType([ StructField('componentAddress', StringType()), StructField('values', StructType([ StructField('confNum', StringType()), StructField('day', IntegerType())]))]))])"}
Hope this helps! 希望这可以帮助!
config_json file: config_json文件:
{"json_data_schema": ["contactId", "firstName", "lastName"]}
PySpark Application : PySpark应用程序:
schema = StructType().add("contactId", StringType()).add("firstName", StringType()).add("lastName", StringType())
Reference: https://www.python-course.eu/lambda.php 参考: https : //www.python-course.eu/lambda.php
schema = StructType()
schema = map(lambda x: schema.add(x, StringType(), True), (data["json_data_schema"]))[0][0:]
Hope this solution works for you! 希望此解决方案对您有用!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.