简体   繁体   中英

Trying to extract data from JSON URL into Pandas

I am trying to extract data from a JSON URL into pandas but this file has multiple "layers" of lists and dictionaries which i just cannot seem to navigate.

import json
from urllib.request import urlopen

with urlopen('https://statdata.pgatour.com/r/010/2020/player_stats.json') as response:
    source = response.read()

data = json.loads(source)

for item in data['tournament']['players']:
    pid = item['pid']
    statId = item['stats']['statId']
    name = item['stats']['name']
    tValue = item['stats']['tValue']
    print(pid, statId, name, tValue)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-84-eadd8bdb34cb> in <module>
      1 for item in data['tournament']['players']:
      2     player_id = item['pid']
----> 3     stat_id = item['stats']['statId']
      4     stat_name = item['stats']['name']
      5     stat_value = item['stats']['tValue']

TypeError: list indices must be integers or slices, not str

The output i am trying to get to is like:-

在此处输入图像描述

You are missing a layer.

To simplify the data, we are trying to access:

"stats": [{
    "statId":"106",
    "name":"Eagles",
    "tValue":"0",
}]

The data of 'stats' starts with [{ . This is a dictionary within an array.

I think this should work:

for item in data['tournament']['players']:
    pid = item['pid']
    for stat in item['stats']:
        statId = stat['statId']
        name = stat['name']
        tValue = stat['tValue']
        print(pid, statId, name, tValue)

To read more on dictionaries: https://realpython.com/iterate-through-dictionary-python/

As the previous answer suggests, stats is a list of stat items. This will show you what happens, and aslo catch any other problems:

import json
from urllib.request import urlopen

with urlopen('https://statdata.pgatour.com/r/010/2020/player_stats.json') as response:
    source = response.read()

data = json.loads(source)

for item in data['tournament']['players']:
    try:
        pid = item['pid']
        stats = item['stats']
        for stat in stats:
            statId = stat['statId']
            name = stat['name']
            tValue = stat['tValue']
            print(pid, statId, name, tValue)
     except Exception as e:
        print(e)
        print(item)
        break

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM