简体   繁体   English

使用python提取嵌套JSON中的特定值

[英]Extract a specific value in nested JSON using python

I'm using the [wikipedia API][1] which returns the following JSON for the page 2016 United States Presidential Election我正在使用[wikipedia API][1] ,它为2016 年美国总统选举页面返回以下 JSON

What I'm attempting to do is to get the value under the key extract .我试图做的是获取 key extract下的值。 The difficulty I'm encountering is that the page value (in this example as 21377251 ) changes for each page.我遇到的困难是每个页面的页面值(在本例中为21377251 )都会发生变化。 I currently have a function as below.我目前有一个功能如下。

Function功能

def fetchSummary(self, title):
    url = ("https://en.wikipedia.org/w/api.php?format=json&origin=*&action=query&prop=extracts&explaintext=false&exintro&titles="+title)
    print(url)
    response = requests.get(url)

    data = response.json()
    print(data['query'].['pages'])


    return()

JSON Output from link来自链接的 JSON 输出

{
    'batchcomplete': '',
    'query': {
        'pages': {
            '21377251': {
                'pageid': 21377251,
                'ns': 0,
                'title': '2016 United States presidential election',
                'extract': 'The 2016 United States presidential election was the 58th quadrennial presidential election, ....Russian government".'
            }
        }
    }
}

Once you have the JSON, you can pull out the page number, then use it to dig further and pull out the extract.获得 JSON 后,您可以提取页码,然后使用它进一步挖掘并提取提取物。

def fetchSummary(self, title):
    url = (
        "https://en.wikipedia.org/w/api.php?format=json&origin=*&action=query&"
        "prop=extracts&explaintext=false&exintro&titles="
        + title
    )
    print(url)
    response = requests.get(url)

    data = response.json()
    pg = list(data['query']['pages'])[0]
    extract = data['query']['pages'][pg]['extract']
    return extract

If there's is only one key each time you can just extract the page number like this:如果每次只有一个键,您可以像这样提取页码:

page = list(data['query']['pages'])[0]
print(data['query']['pages'][page]['extract'])

If there is more than one you could just get the list using keys() and then loop them like this:如果有多个,您可以使用 keys() 获取列表,然后像这样循环它们:

pages = list(data['query']['pages'].keys())
for page in pages:
    print(data['query']['pages'][page]['extract'])
    

If it is always one page:如果它总是一页:

list(data["query"]["pages"].values())[0]["extract"]

If it can contain multiple pages:如果它可以包含多个页面:

for val in data["query"]["pages"].values():
    print(val["extract"])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM