简体   繁体   中英

Extract a specific value in nested JSON using python

I'm using the [wikipedia API][1] which returns the following JSON for the page 2016 United States Presidential Election

What I'm attempting to do is to get the value under the key extract . The difficulty I'm encountering is that the page value (in this example as 21377251 ) changes for each page. I currently have a function as below.

Function

def fetchSummary(self, title):
    url = ("https://en.wikipedia.org/w/api.php?format=json&origin=*&action=query&prop=extracts&explaintext=false&exintro&titles="+title)
    print(url)
    response = requests.get(url)

    data = response.json()
    print(data['query'].['pages'])


    return()

JSON Output from link

{
    'batchcomplete': '',
    'query': {
        'pages': {
            '21377251': {
                'pageid': 21377251,
                'ns': 0,
                'title': '2016 United States presidential election',
                'extract': 'The 2016 United States presidential election was the 58th quadrennial presidential election, ....Russian government".'
            }
        }
    }
}

Once you have the JSON, you can pull out the page number, then use it to dig further and pull out the extract.

def fetchSummary(self, title):
    url = (
        "https://en.wikipedia.org/w/api.php?format=json&origin=*&action=query&"
        "prop=extracts&explaintext=false&exintro&titles="
        + title
    )
    print(url)
    response = requests.get(url)

    data = response.json()
    pg = list(data['query']['pages'])[0]
    extract = data['query']['pages'][pg]['extract']
    return extract

If there's is only one key each time you can just extract the page number like this:

page = list(data['query']['pages'])[0]
print(data['query']['pages'][page]['extract'])

If there is more than one you could just get the list using keys() and then loop them like this:

pages = list(data['query']['pages'].keys())
for page in pages:
    print(data['query']['pages'][page]['extract'])
    

If it is always one page:

list(data["query"]["pages"].values())[0]["extract"]

If it can contain multiple pages:

for val in data["query"]["pages"].values():
    print(val["extract"])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM