[英]Extracting geo coordinates from a complex nested Twitter json, using Python
[英]Extracting from a very complex JSON file in Python
我正在尝试使用Python从非常复杂的JSON文件中获取一些信息。 下面只是文件中的一个对象:
{
"__metadata": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)", "etag": "W/\"2\"", "type": "Microsoft.SharePoint.DataService.PostsItem"
}, "Title": "Term 2 Round 2 draws", "Body": "<div class=\"ExternalClass0BC1BCA4D3EE45A4A1F34086034FE827\"><p>\u200bAs there is no Gonzagan this week the following Senior Sport information has been provided here.\r\n\t </p>\r\n<ul><li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/Knox _wet_weather.pdf\">Knox _wet_weather</a> Cancellations, please see <a target=\"_blank\" href=\"http://www.twitter.com/SACWetWeather\">twitter page</a> for further news.</li>\r\n<li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/2011_Football_round_2.pdf\">2011 Football draw Round 2</a></li>\r\n<li><a target=\"_blank\" href=\"/Intranet/students/news_resources/2011/Term2/2011_Rugby_round_2.pdf\">2011 Rugby draw Round 2</a></li></ul>\r\n<p></p></div>", "Category": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/Category"
}
}, "Published": "\/Date(1308342960000)\/", "ContentTypeID": "0x0110001F9F7104FDD3054AAB40D8561196E09E", "ApproverComments": null, "Comments": {
"__deferred": {
"uri": "/_vti_bin/ListData.svc/Posts(4)/Comments"
}
}, "CommentsId": 0, "ApprovalStatus": "0", "Id": 4, "ContentType": "Post", "Modified": "\/Date(1309122092000)\/", "Created": "\/Date(1309120597000)\/", "CreatedBy": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/CreatedBy"
}
}, "CreatedById": 1, "ModifiedBy": {
"__deferred": {
"uri": "/Students/news/_vti_bin/ListData.svc/Posts(4)/ModifiedBy"
}
}, "ModifiedById": 1, "Owshiddenversion": 2, "Version": "1.0", "Path": "/Students/news/Lists/Posts"
},
我不能全神贯注于编辑此内容。 将其转换为python字典似乎会使属性的顺序混乱,这使我无法找到一个对象在哪里开始而另一个在哪里开始。 对我来说,仅提取“标题”,“正文”和“已发布”键和值的最佳方法是什么,如何对多个对象进行提取?
import json
obj = json.loads(json_input)
for record in obj:
print obj["title"]
print obj["body"]
print obj["published"]
假设json_input是上面的代码段,采用字符串形式,或者已经通过文件读取。 另请注意,我认为上面的代码片段是根据您的问题收集的。
更新
根据示例,您拥有另一个不在原始发布的代码段中的图层。
将循环更改为:
for record in obj["d"]["results"]:
...
我假设您的主要JSON对象是这些对象的数组。 这是我打印出您所需要的信息的方式:
import json
main_array = json.load('my_json_file.json')
for sub_object in main_array:
print "Title: {}\nBody: {}\nPublished: {}\n".format(
sub_object['Title'], sub_object['Body'], sub_object['Published']
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.