简体   繁体   English

Python API 调用:JSON 至 Z251D2BBFE9A3B95E5691CEBAZ30DC6784

[英]Python API Call: JSON to Pandas DF

I'm working on pulling data from a public API and converting the response JSON file to a Pandas Dataframe.我正在从公共 API 中提取数据,并将响应 JSON 文件转换为 Pandas ZC699575A5E8AFD11FBZBA2。 I've written the code to pull the data and gotten a successful JSON response.我已经编写了提取数据的代码并获得了成功的 JSON 响应。 The issue I'm having is parsing through the file and converting the data to a dataframe.我遇到的问题是解析文件并将数据转换为 dataframe。 Whenever I run through my for loop, I get a dataframe that retruns 1 row when it should be returning approximately 2500 rows & 6 columns.每当我运行我的 for 循环时,我都会得到一个 dataframe ,当它应该返回大约 2500 行和 6 列时,它会返回 1 行。 I've copied and pasted my code below:我在下面复制并粘贴了我的代码:

Things to note: I've commented out my api key with "api_key".注意事项:我已经用“api_key”注释掉了我的 api 密钥。 I'm new(ish) to python so I understand that my code formatting might not be best practices.我是 python 的新手,所以我知道我的代码格式可能不是最佳实践。 I'm open to changes.我愿意接受改变。 Here is the link to the API that I am requesting from: https://developer.va.gov/explore/facilities/docs/facilities?version=current这是我要求的 API 的链接: https://developer.va.gov/explore/facilities/docs/facilities?version=current

facilities_data = pd.DataFrame(columns=['geometry_type', 'geometry_coordinates', 'id', 'facility_name', 'facility_type','facility_classification'])


# function that will make the api call and sort through the json data
def get_facilities_data(facilities_data):
    # Make API Call
    res = requests.get('https://sandboxapi.va.gov/services/va_facilities/v0/facilities/all',headers={'apikey': 'api_key'})
    data = json.loads(res.content.decode('utf-8'))
    time.sleep(1)

    for facility in data['features']:
        geometry_type = data['features'][0]['geometry']['type']
        geometry_coordinates = data['features'][0]['geometry']['coordinates']
        facility_id = data['features'][0]['properties']['id']
        facility_name = data['features'][0]['properties']['name']
        facility_type = data['features'][0]['properties']['facility_type']
        facility_classification = data['features'][0]['properties']['classification']

    # Save data into pandas dataframe
    facilities_data = facilities_data.append(
        {'geometry_type': geometry_type, 'geometry_coordinates': geometry_coordinates,
         'facility_id': facility_id, 'facility_name': facility_name, 'facility_type': facility_type,
         'facility_classification': facility_classification}, ignore_index=True)
    return facilities_data


facilities_data = get_facilities_data(facilities_data)
print(facilities_data)```


As mentioned, you should如前所述,您应该

  1. loop over facility instead of data['features'][0]循环facility而不是data['features'][0]
  2. append within the loop append内循环

This will get you the result you are after.这将为您提供您想要的结果。

    facilities_data = pd.DataFrame(columns=['geometry_type', 'geometry_coordinates', 'id', 'facility_name', 'facility_type','facility_classification'])

    def get_facilities_data(facilities_data):
        # Make API Call
        res = requests.get("https://sandbox-api.va.gov/services/va_facilities/v0/facilities/all",
                     headers={"apikey": "1rbY6VeHjmGnAXSGA7M7Ek2cUBiuNA3a"})
        data = json.loads(res.content.decode('utf-8'))
        time.sleep(1)

        for facility in data['features']:
            geometry_type = facility['geometry']['type']
            geometry_coordinates = facility['geometry']['coordinates']
            facility_id = facility['properties']['id']
            facility_name = facility['properties']['name']
            facility_type = facility['properties']['facility_type']
            facility_classification = facility['properties']['classification']

            # Save data into pandas dataframe
            facilities_data = facilities_data.append(
            {'geometry_type': geometry_type, 'geometry_coordinates': geometry_coordinates,
             'facility_id': facility_id, 'facility_name': facility_name, 'facility_type': facility_type,
             'facility_classification': facility_classification}, ignore_index=True)
        return facilities_data


    facilities_data = get_facilities_data(facilities_data)
    print(facilities_data.head())

There are some more things we can improve upon;还有一些我们可以改进的地方;

  • json() can be called directly on requests output json()可以直接在请求上调用 output
  • time.sleep() is not needed不需要time.sleep()
  • appending to a DataFrame on each iteration is discouraged;不鼓励在每次迭代时附加到DataFrame we can collect the data in another way and create the DataFrame afterwards.我们可以用另一种方式收集数据,然后创建DataFrame

Implementing these improvements results in;实施这些改进会导致;

def get_facilities_data():
    data = requests.get("https://sandbox-api.va.gov/services/va_facilities/v0/facilities/all",
                     headers={"apikey": "REDACTED"}).json()

    facilities_data = []
    for facility in data["features"]:
        facility_data = (facility["geometry"]["type"],
                         facility["geometry"]["coordinates"],
                         facility["properties"]["id"],
                         facility["properties"]["name"],
                         facility["properties"]["facility_type"],
                         facility["properties"]["classification"])
        facilities_data.append(facility_data)

    facilities_df = pd.DataFrame(data=facilities_data,
                                 columns=["geometry_type", "geometry_coords", "id", "name", "type", "classification"])
    return facilities_df

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM