简体   繁体   English

如何使用 PHP 限制 API 请求

[英]How to throttle API requests using PHP

We plan to use the SEMrush API, which allows access to SEO data relating to domain names and search keywords.我们计划使用 SEMrush API,它允许访问与域名和搜索关键字相关的 SEO 数据。 Under their Terms of Use , they limit their usage to avoid killing their servers:根据他们的使用条款,他们限制使用以避免杀死他们的服务器:

You may not perform more than 10 requests per second, nor more than 2 simultaneous requests.您每秒不能执行超过 10 个请求,也不能同时执行超过 2 个请求。

We are going to be building a simple tool in PHP that aggregates data based on a domain name and are looking for the basics on how to fulfill that requirement.我们将在 PHP 中构建一个简单的工具,用于根据域名聚合数据,并正在寻找有关如何满足该要求的基础知识。 We are planning for hundreds/thousands of potential simultaneous users.我们正在规划成百上千的潜在用户。

Maybe someone can provide some pseudo code in PHP that would let us do this - or is it really just as simple as forcing the actual API request function to sleep for 1 second in between each command?也许有人可以在 PHP 中提供一些伪代码让我们这样做 - 或者它真的像强制实际的 API 请求函数在每个命令之间休眠 1 秒一样简单吗? I don't have a lot of experience with APIs and large amounts of concurrent users so any help is appreciated.我对 API 和大量并发用户没有很多经验,因此感谢您提供任何帮助。

PHP is really not the best language to use for concurrent programming. PHP 确实不是用于并发编程的最佳语言。 However, there are some third party solutions that you can use along-side of PHP that can help you achieve your goals.但是,您可以将一些第三方解决方案与 PHP 一起使用,以帮助您实现目标。

What you need is a job-manager or a queue system that can handle the actual requests for you.您需要的是可以为您处理实际请求的作业管理器或队列系统。 Since this is a back-end tool ( at least that's what I gathered from your question ) it doesn't require PHP to handle the actual control over the jobs themselves, but just have some controlling process schedule these individual jobs and hand them to your PHP scripts so that you can effectively impose these limits.由于这是一个后端工具(至少这是我从您的问题中收集到的),它不需要 PHP 来处理对作业本身的实际控制,而只需要一些控制过程来安排这些单独的作业并将它们交给您PHP 脚本,以便您可以有效地施加这些限制。

My first suggestion would be to try something like gearman , which is a great job manager and has an extension in PHP to help you interface with the library.我的第一个建议是尝试使用gearman 之类的东西,它是一个很棒的工作管理器,并且在 PHP 中有一个扩展,可以帮助您与库进行交互。

Another suggestion is to take a look at queue systems like amqp or zmq , some of which also have extensions in PHP .另一个建议是查看像amqpzmq这样的队列系统,其中一些在 PHP 中也有扩展

So here's an example scenario for you...所以这里有一个例子场景给你......

You have a PHP script that accepts these requests and hands them off to your job manager or queue over a socket.您有一个 PHP 脚本,它接受这些请求并将它们传递给您的作业管理器或通过套接字排队。 The job manager or queue will store the request and distribute it off to the individual workers in an a way that can be centralized and controlled to impose these limits.作业管理器或队列将存储请求并将其分发给各个工作人员,这种方式可以集中和控制以施加这些限制。 There are some examples from the links I gave you that can help you get there.我给你的链接中有一些例子可以帮助你到达那里。 However, doing it purely in PHP without the aid of these tools will prove quite tricky and could wind up in some very edge-case buggy behavior if not carefully crafted and considered.然而,在没有这些工具的帮助下纯粹在 PHP 中完成它会被证明是非常棘手的,如果不仔细设计和考虑,可能会导致一些非常边缘情况的错误行为。

Some APIs return rate limit information in the response header.一些 API 在响应头中返回速率限制信息。 Check out: Examples of HTTP API Rate Limiting HTTP Response headers This information will help you wait for a few nanoseconds, before continuing with your next request using PHP's time_nanosleep()查看: HTTP API 速率限制 HTTP 响应标头示例此信息将帮助您等待几纳秒,然后使用 PHP 的time_nanosleep()继续您的下一个请求

Some PHP libraries go pretty in-depth with their ways of rate-limiting.一些 PHP 库非常深入地使用了它们的速率限制方法。 The Bucket Token Algorithm is pretty common across the web: https://github.com/bandwidth-throttle/token-bucket桶令牌算法在网络上很常见: https : //github.com/bandwidth-throttle/token-bucket

Now I find this a bit overkill when it comes down to throttling some URL requests that don't have something like X-RateLimit-Remaining in their return header.现在,当涉及到限制一些在返回标头中没有X-RateLimit-Remaining类的 URL 请求时,我发现这有点矫枉过正。 API requests in general are usually pretty slow. API 请求通常很慢。 So I've built the PHP script below.所以我在下面构建了 PHP 脚本。

This PHP script will just wait for a few milliseconds based on a $throttlerID .这个 PHP 脚本将根据$throttlerID等待几毫秒。 Higher requestsInSeconds will result in shorter wait times... If the same $throttlerID is used across simultaneous requests, each request will wait for the other using File-Locking ( FLOCK() ).更高的requestsInSeconds将导致更短的等待时间......如果在同时请求中使用相同的$throttlerID ,每个请求将使用文件锁定( FLOCK() )等待另一个请求。

    function Throttler($requestsInSeconds, $throttlerID) {

        // Use FLOCK() to create a system global lock (it's crash-safe:))
        $fp = fopen(sys_get_temp_dir()."/$throttlerID", "w+");

        // exclusive lock will blocking wait until obtained
        if (flock($fp, LOCK_EX)) { 

             // Sleep for a while (requestsInSeconds should be 1 or higher)
             $time_to_sleep = 999999999 / $requestsInSeconds; 
             time_nanosleep(0, $time_to_sleep);
    
             flock($fp, LOCK_UN); // unlock
         }

        fclose($fp);

    }

Put the call to Throttler() right before each CURL call.在每次CURL调用之前调用Throttler() That's it!而已!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM