简体   繁体   中英

Iterating through a JSON file in python 3

Currently I'm trying to get 'stringency' data from a json file which contains dates and countries. Here's an excerpt of what the json output looks like:

import pandas as pd
import json
from bs4 import BeautifulSoup

# load file
with open("Stringency April 8.txt") as file:
    stringency_data = json.load(file)

stringency_data["data"]

#this gives the output:

{'2020-01-02': {'ABW': {'confirmed': None,`
   'country_code': 'ABW',
   'date_value': '2020-01-02',
   'deaths': None,
   'stringency': 0,
   'stringency_actual': 0},
  'AFG': {'confirmed': 0,
   'country_code': 'AFG',
   'date_value': '2020-01-02',
   'deaths': 0,
   'stringency': 0,
   'stringency_actual': 0},
  'AGO': {'confirmed': None,
   'country_code': 'AGO',
   'date_value': '2020-01-02',
   'deaths': None,
   'stringency': 0,
   'stringency_actual': 0},
  'AUS': {'confirmed': 0,
   'country_code': 'AUS',
   'date_value': '2020-01-02',
   'deaths': 0,
   'stringency': 7.14,
   'stringency_actual': 7.14},
  'AUT': {'confirmed': 0,
   'country_code': 'AUT',
   'date_value': '2020-01-02',
   'deaths': 0,
   'stringency': 0,
   'stringency_actual': 0},.........

Here's my code so far (I've shortened it a bit for the sake of this post):

# create empty list for dates

date_index = []
[date_index.append(date) for date in stringency_data["data"]]



#creates empty lists for countries

Australia = []
Austria = []
...
US = []

# put these lists into a list
countries_lists = [Australia, Austria,...US]

# put country codes into a list
country_codes = ["AUS", "AUT",..."USA"]
# loop through countries

i = 0

for country, code in zip(countries_lists, country_codes):
    while i<=len(date_index):
        country.append(stringency_data["data"][date_index[i]][code]["stringency_actual"])
        i+=1

When I print the list "Australia" I get all the values I want. But ever country from Austria onwards is still an empty list.

I get the output - KeyError : "AUS". This indicates that the code retrieved the whole time series, but only for the first country (Australia). How can I loop this for each country code?

Here's what I see about the data you've described/shown:

file data is a dictionary; single known/desired key is "data", value is a dictionary.
--> keys are all date_strings.  Each value is a dictionary.
-----> keys are all country_codes.  Each value is a dictionary.
--------> a key "stringency_actual" is present, and its value is desired.

So a straightforward plan for getting this data out could look like this:

1. grab file['data']
2. iterate all keys and values in this dictionary.  (Actually, you may only care about the values.)
3. iterate all keys and values in this dictionary.  Keys are country_codes, which tell you to which list you want to append the stringency_actual value you're about to get.
4. grab this dictionary['stringency_actual'] and append it to the list corresponding to the correct country.
4b. translate the country_code to the country_name, since that's apparently how you would like to store this data for now.

I changed the data retrieval because the data is all dictionaries so it's self-describing by its keys. Doing it this way can help prevent the KeyError I see mentioned in the original question and a comment. (Without the complete input file or the line number of the KeyError, I think none of us is 100% certain which value in the input is causing that KeyError.)

Potential answer:

import json

# Input sample data; would actually be retrieved from file.
stringency_data = json.loads("""
{"data": {"2020-01-02": {"ABW": {"confirmed": null,
   "country_code": "ABW",
   "date_value": "2020-01-02",
   "deaths": null,
   "stringency": 0,
   "stringency_actual": 0},
  "AFG": {"confirmed": 0,
   "country_code": "AFG",
   "date_value": "2020-01-02",
   "deaths": 0,
   "stringency": 0,
   "stringency_actual": 0},
  "AGO": {"confirmed": null,
   "country_code": "AGO",
   "date_value": "2020-01-02",
   "deaths": null,
   "stringency": 0,
   "stringency_actual": 0},
  "AUS": {"confirmed": 0,
   "country_code": "AUS",
   "date_value": "2020-01-02",
   "deaths": 0,
   "stringency": 7.14,
   "stringency_actual": 7.14},
  "AUT": {"confirmed": 0,
   "country_code": "AUT",
   "date_value": "2020-01-02",
   "deaths": 0,
   "stringency": 0,
   "stringency_actual": 0}}}
}""")

country_name_by_code = {
    'ABW': 'Aruba',
    'AFG': 'Afghanistan',
    'AUS': 'Australia',
    'AUT': 'Austria',
#    ...
    'USA': 'United States'
}

# Output data we want to create
actual_stringencies_by_country_name = {}


# Helper method to store data we're interested in
def append_country_stringency(country_code, actual_stringency_value):
    if country_code not in country_name_by_code:
        print(f'Unknown country_code value "{country_code}"; ignoring.')
        return

    country_name = country_name_by_code[country_code]

    if country_name not in actual_stringencies_by_country_name:
        actual_stringencies_by_country_name[country_name] = []

    actual_stringencies_by_country_name[country_name].append(actual_stringency_value)


# Walk our input data and store the parts we're looking for
for date_string, data_this_date in stringency_data['data'].items():
    for country_code, country_data in data_this_date.items():
        append_country_stringency(country_code, country_data['stringency_actual'])

print(actual_stringencies_by_country_name)

My output:

C:\some_dir>python test.py
Unknown country_code value "AGO"; ignoring.
{'Aruba': [0], 'Afghanistan': [0], 'Australia': [7.14], 'Austria': [0]}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM