简体   繁体   English

使用python聚合多个API请求结果

[英]Aggregate multiple APIs request results using python

I'm working on an application that will have to use multiple external APIs for information and after processing the data, will output the result to a client. 我正在开发一个应用程序,该应用程序必须使用多个外部API来获取信息,并且在处理数据之后,会将结果输出到客户端。 The client uses a web interface to query, once query is send to server, server process send requests to different API providers and after joining the responses from those APIs then return response to client. 客户端使用Web界面进行查询,一旦查询发送到服务器,服务器进程将请求发送到不同的API提供程序,并在加入这些API的响应之后,将响应返回给客户端。

All responses are in JSON. 所有响应均采用JSON。

current approach: 当前方法:

import requests
def get_results(city, country, query, type, position):
    #get list of apis with authentication code for this query 
    apis = get_list_of_apis(type, position) 
    results = [ ]
    for api in apis:
        result = requests.get(api)
        #parse json 
        #combine result in uniform format to display 
    return results

Server uses Django to generate response. 服务器使用Django生成响应。
Problem with this approach 这种方法的问题
(i) This may generate huge amounts of data even though client is not interested in all. (i)即使客户对所有内容都不感兴趣,这可能会生成大量数据。
(ii) JSON response has to be parsed based on different API specs. (ii)必须根据不同的API规范来解析JSON响应。

How to do this efficiently? 如何有效地做到这一点?

Note: Queries are being done to serve job listings. 注意:正在执行查询以提供工作清单。

Most APIs of this nature allow for some sort of "paging". 这种性质的大多数API都允许某种“分页”。 You should code your requests to only draw a single page from each provider. 您应该对您的请求进行编码,以仅从每个提供者绘制一个页面。 You can then consolidate the several pages locally into a single stream. 然后,您可以在本地将几个页面合并为一个流。

If we assume you have 3 providers, and page size is fixed at 10, you will get 30 responses. 如果我们假设您有3个提供程序,并且页面大小固定为10,那么您将获得30个响应。 Assuming you only show 10 listings to the client, you will have to discard and re-query 20 listings. 假设您只向客户显示10个列表,则必须丢弃并重新查询20个列表。 A better idea might be to locally cache the query results for a short time (say 15 minutes to an hour) so that you don't have to requery the upstream providers each time your user advances a page in the consolidated list. 一个更好的主意可能是在短时间内(例如15分钟到一个小时)在本地缓存查询结果,这样您就不必在用户每次在合并列表中推进页面时都重新查询上游提供程序。

As far as the different parsing required for different providers, you will have to handle that internally. 至于不同提供程序所需的不同解析,您将必须在内部进行处理。 Create different classes for each. 为每个创建不同的类。 The list of providers is fixed, and small, so you can code a table of which provider-url gets which class behavior. 提供程序的列表是固定的,并且很小,因此您可以编写一个表,以了解哪个provider-url获取哪种类的行为。

Shameless plug but I wrote a post on how I did exactly this in Durango REST framework here . 无耻的插头,但我写了我是如何做的正是这种在杜兰戈REST框架后在这里

I highly recommend using Django REST framework, it makes everything so much easier 我强烈建议使用Django REST框架,它使一切变得如此简单

Basically, the model on your APIs end is extremely simple and simply contains information on what external API is used and the ID for that API resource. 基本上,API端的模型非常简单,仅包含有关使用了哪种外部API以及该API资源的ID的信息。 A GenericProvider class then provides an abstract interface to perform CRUD operations on the external source. 然后,GenericProvider类提供抽象接口以在外部源上执行CRUD操作。 This GenericProvider uses other providers that you create and determines what provider to use via the provider field on the model. 此GenericProvider使用您创建的其他提供程序,并通过模型上的provider字段确定要使用的提供程序。 All of the data returned by the GenericProvider is then serialised as usual. 然后,将GenericProvider返回的所有数据照常进行序列化。

Hope this helps! 希望这可以帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM