简体   繁体   English

无法使用 Python 循环浏览分页 API 响应

[英]Unable to loop through paged API responses with Python

So, i'm scratching my head with this one.所以,我正在为这个问题挠头。 Using HubSpot's API, i need to get a list of ALL the companies in my client's "portal" (account).使用 HubSpot 的 API,我需要获取客户“门户”(帐户)中所有公司的列表。 Sadly, the standard API call only returns 100 companies at a time.遗憾的是,标准 API 调用一次只能返回 100 家公司。 When it does return a response, it includes two parameters which make paging through responses possible.当它确实返回响应时,它包含两个参数,使分页响应成为可能。

One of those is "has-more": True (this lets you know if you can expect any more pages) and the other is "offset":12345678 (the timestamp to offset the request by.)其中之一是"has-more": True (这让您知道是否可以期待更多页面),另一个是"offset":12345678 (偏移请求的时间戳。)

These two parameters are things you can pass back into the next API call to get the next page.这两个参数是您可以传递回下一个 API 调用以获取下一页的内容。 So for example, the initial API call might look like:例如,初始 API 调用可能如下所示:

"https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)

Whereas the follow up calls might look like:而后续电话可能如下所示:

"https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)

So this is what i've tried so far:所以这是我迄今为止尝试过的:

#!/usr/bin/python
# -*- coding: utf-8 -*-

import sys
import os.path
import requests
import json
import csv
import glob2
import shutil
import time
import time as howLong
from time import sleep
from time import gmtime, strftime

HubSpot_Customer_Portal_ID = "XXXXXX"

wta_hubspot_api_key = "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"

findCSV = glob2.glob('*contact*.csv')

theDate = time=strftime("%Y-%m-%d", gmtime())
theTime = time=strftime("%H:%M:%S", gmtime())

try:
    testData = findCSV[0]
except IndexError:
    print ("\nSyncronisation attempted on {date} at {time}: There are no \"contact\" CSVs, please upload one and try again.\n").format(date=theDate, time=theTime)
    print("====================================================================================================================\n")
    sys.exit()

for theCSV in findCSV:

    def get_companies():
        create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
        headers = {'content-type': 'application/json'}
        create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
        if create_get_recent_companies_response.status_code == 200:

            offset = create_get_recent_companies_response.json()[u'offset']
            hasMore = create_get_recent_companies_response.json()[u'has-more']

            while hasMore == True:
                for i in create_get_recent_companies_response.json()[u'companies']:
                    get_more_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)
                    get_more_companies_call_response = requests.get(get_more_companies_call, headers=headers)
                    companyName = i[u'properties'][u'name'][u'value']
                    print("{companyName}".format(companyName=companyName))


        else:
            print("Something went wrong, check the supplied field values.\n")
            print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

    if __name__ == "__main__":
        get_companies()
        sys.exit()

The problem is that it just keeps returning the same intitial 100 results;问题是它只是不断返回相同的初始 100 个结果; this is happening because the parameter "has-more":True is true on the initial call, so it'll just keep returning the same ones...发生这种情况是因为参数"has-more":True在初始调用时为真,因此它只会继续返回相同的参数...

My ideal scenario is that I'm able to parse ALL the companies across approximately 120 response pages (there are around 12000 companies).我的理想情况是我能够解析大约 120 个响应页面(大约有 12000 家公司)中的所有公司。 As I pass through each page, i'd like to append it's JSON content to a list, so that eventually I have this list which contains the JSON responses of all 120 pages, so that I can parse that list for use in a different function.当我浏览每个页面时,我想将它的 JSON 内容附加到一个列表中,这样最终我就有了这个包含所有 120 个页面的 JSON 响应的列表,以便我可以解析该列表以用于不同的功能.

I am in desperate need of a solution :(我迫切需要一个解决方案:(

This is the function I am replacing in my main script:这是我在主脚本中替换的函数:

            def get_companies():

                create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/recent/modified?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
                headers = {'content-type': 'application/json'}
                create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
                if create_get_recent_companies_response.status_code == 200:

                    for i in create_get_recent_companies_response.json()[u'results']:
                        company_name = i[u'properties'][u'name'][u'value']
                        #print(company_name)
                        if row[0].lower() == str(company_name).lower():
                            contact_company_id = i[u'companyId']
                            #print(contact_company_id)
                            return contact_company_id
                else:
                    print("Something went wrong, check the supplied field values.\n")
                    #print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

The problem seems to be that:问题似乎在于:

  • You get the offset in your first call, but don't do anything with the actual companies data that this call returns.您在第一次调用中获得了偏移量,但不要对此调用返回的实际公司数据执行任何操作。
  • You then use this same offset in your while loop;然后在 while 循环中使用相同的偏移量; you never use the new one from subsequent calls.您永远不会使用后续调用中的新方法。 This is why you get the same companies every time.这就是为什么你每次都会得到相同的公司。

I think this code for get_companies() should work for you.我认为get_companies()这段代码应该适合你。 I can't test it, obviously, but hopefully it is OK:显然,我无法测试它,但希望它没问题:

def get_companies():
        create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}".format(hapikey=wta_hubspot_api_key)
        headers = {'content-type': 'application/json'}
        create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)
        if create_get_recent_companies_response.status_code == 200:

            while True:
                for i in create_get_recent_companies_response.json()[u'companies']:
                    companyName = i[u'properties'][u'name'][u'value']
                    print("{companyName}".format(companyName=companyName))
                offset = create_get_recent_companies_response.json()[u'offset']
                hasMore = create_get_recent_companies_response.json()[u'has-more']
                if not hasMore:
                    break
                else:
                    create_get_recent_companies_call = "https://api.hubapi.com/companies/v2/companies/?hapikey={hapikey}&offset={offset}".format(hapikey=wta_hubspot_api_key, offset=offset)
                    create_get_recent_companies_response = requests.get(create_get_recent_companies_call, headers=headers)


        else:
            print("Something went wrong, check the supplied field values.\n")
            print(json.dumps(create_get_recent_companies_response.json(), sort_keys=True, indent=4))

Strictly, the else after the break isn't required but it is in keeping with the Zen of Python "Explicit is better than implicit"严格来说, break后的else不是必需的,但它符合Python禅宗“显式优于隐式”

Note that you are only checking for a 200 response code once, if something goes wrong inside your loop you will miss it.请注意,您只检查一次 200 响应代码,如果循环中出现问题,您将错过它。 You should probably put all your calls inside the loop and check for a proper response every time.您可能应该将所有调用放入循环中,并每次都检查是否有正确的响应。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM