简体   繁体   English

如何将JSON文件的一部分加载到DataFrame?

[英]How to load part of JSON file to a DataFrame?

I have a file that has contents like this: 我有一个文件,其内容如下:

a {"field1":{"field2":"val","field3":"val"...}}
b {"field1":{"field2":"val","field3":"val"...}}
...

and I was able to load the file to a table like this: 并且我能够将文件加载到这样的表中:

╔════╦════════════════════════════════════════════════
║ ID ║  JSON                                         ║
╠════╬════════════════════════════════════════════════
║  a ║ {"field1":{"field2":"val","field3":"val"...}} ║
║  b ║ {"field1":{"field2":"val","field3":"val"...}} ║
╚════╩════════════════════════════════════════════════

How can I make it into something like this? 我怎样才能做成这样的东西?

╔════╦═════════════════════════════════════
║ ID ║ field2  ║field3 ║...     ║...     ║
╠════╬═════════════════════════════════════
║  a ║ val     ║val    ║..      ║...     ║
║  b ║ val     ║val    ║..      ║...     ║
╚════╩═════════════════════════════════════

Since it is a partial json file, I cannot do read.json I saw this post too convert lines of json in RDD to dataframe in apache Spark But my json string is a nested json and it is very long, so I do not want to list out all the fields. 由于它是部分json文件,因此我无法read.json ,我也看到了这篇文章,也将RDD中的json行转换为apache Spark中的dataframe,但是我的json字符串是嵌套的json,而且它很长,所以我不想列出所有字段。 I also tried 我也试过

#solr_data is the data frame made from the file, and json is the column with the json string, session is a SparkSession
json_table = solr_data.select(solr_data["json"]).rdd.map(lambda x:session.read.json(x))

That did not work well. 效果不好。 I can't show() nor collect() for that, createDataFrame() didn't work for that either. 我既不能show()也不能collect()createDataFrame()也不适合。

使用select("JSON.field1.*")将子JSON“解构”为列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM