简体   繁体   English

我们能否长时间可靠地保持 HTTP/S 连接打开?

[英]Can we reliably keep HTTP/S connection open for a long time?

My team maintains an application (written on Java) which processes long running batch jobs.我的团队维护着一个处理长时间运行的批处理作业的应用程序(用 Java 编写)。 These jobs needs to be run on a defined sequence.这些作业需要按定义的顺序运行。 Hence, the application starts a socket server on a pre-defined port to accept job execution requests.因此,应用程序在预定义端口上启动套接字服务器以接受作业执行请求。 It keeps the socket open until the job completes (with success or failure).它使套接字保持打开状态,直到作业完成(成功或失败)。 This way the job scheduler knows when one job ends and upon successful completion of the job, it triggers the next job in the pre-defined sequence.通过这种方式,作业调度程序知道一个作业何时结束,并且在作业成功完成后,它会按预定义的顺序触发下一个作业。 If the job fails, scheduler sends out an alert.如果作业失败,调度程序会发出警报。

This is a setup we have had for over a decade.这是我们已经有十多年的设置。 We have some jobs which runs for a few minutes and other which takes a couple hours (depending on the volume) to complete.我们有一些作业只运行几分钟,而另一些则需要几个小时(取决于数量)才能完成。 The setup has worked without any issues.该设置工作正常,没有任何问题。

Now, we need to move this application to a container (RedHat OpenShift Container Platform) and the infra policy in place allows only default HTTPS port be exposed.现在,我们需要将该应用程序移动到一个容器(RedHat OpenShift 容器平台),并且现有的基础设施策略只允许公开默认的 HTTPS 端口。 The scheduler sits outside OCP and cannot access any port other than the default HTTPS port.调度程序位于 OCP 之外,无法访问除默认 HTTPS 端口以外的任何端口。

In theory, we could use the HTTPS, set Client timeout to a very large duration and try to mimic the the current setup with TCP socket.理论上,我们可以使用 HTTPS,将客户端超时设置为非常长的持续时间,并尝试模仿 TCP 套接字的当前设置。 But would this setup be reliable enough as HTTP protocol is designed to serve short-lived requests?但是这种设置是否足够可靠,因为 HTTP 协议旨在为短期请求提供服务?

There isn't a reliable way to keep a connection alive for a long period over the inte.net, because of nodes (routers, load balancers, proxies, nat gateways, etc) that may be sitting between your client and server, they might drop mid connection under load, some of them will happily ignore your HTTP keep alive request, or have an internal max connection duration time that will kill long running TCP connections, you may find it works for you today but there is no guarantee it will work for you tomorrow.由于节点(路由器、负载均衡器、代理、nat 网关等)可能位于您的客户端和服务器之间,因此没有一种可靠的方法可以在 inte.net 上长时间保持连接,它们可能在负载下放弃中间连接,他们中的一些人会很高兴地忽略你的 HTTP 保持活动请求,或者有一个内部最大连接持续时间会杀死长时间运行的 TCP 连接,你今天可能会发现它对你有用,但不能保证它会工作明天给你。

So you'll probably need to submit the job as a short lived request and check the status via other means:因此,您可能需要将作业作为短期请求提交并通过其他方式检查状态:

  • Push based strategy by sending a webhook URL as part of the job submission and have the server call it (possibly with retries) on job completion to notify interested parties.基于推送的策略通过发送一个 webhook URL 作为作业提交的一部分,并让服务器在作业完成时调用它(可能会重试)以通知相关方。
  • Pull based strategy by having the server return a job ID on submission, then have the client check periodically.基于拉取的策略,让服务器在提交时返回作业 ID,然后让客户端定期检查。 Due to the nature of your job durations, you may want to implement this with some form of exponential backoff up to a certain limit, for example, first check after waiting for 2 seconds, then wait for 4 seconds before next check, then 8 seconds, and so on, up to a maximum of time you are happy to wait between each check.由于您的工作持续时间的性质,您可能希望通过某种形式的指数退避来实现这一点,直到达到一定的限制,例如,先等待 2 秒后检查,然后等待 4 秒再进行下一次检查,然后等待 8 秒,依此类推,直到您愿意在每次检查之间等待的最大时间。 So that you can find out about short job completions sooner and not check too frequently for long jobs.这样您就可以更快地了解短期工作的完成情况,而不必过于频繁地检查长期工作。

When your worked with socket and TCPprotocol you were in control on how long to keep connections open.当您使用套接字和 TCP 协议时,您可以控制保持连接打开的时间。 With HTTP you are only in control of logical connections and not physical ones.使用 HTTP,您只能控制逻辑连接,而不能控制物理连接。 Actual connections are controlled by OS and usually IT people can configure all those timeouts.实际连接由操作系统控制,通常 IT 人员可以配置所有这些超时。 But by default how it works is that when you even close logical connection the real connection is no closed in anticipation of next communication.但默认情况下,它的工作原理是,当您甚至关闭逻辑连接时,实际连接并没有关闭,以等待下一次通信。 It is closed by OS and not controlled by your code.它由操作系统关闭,不受您的代码控制。 However even if it closes and your next request comes after that it is opened transparently to you.但是,即使它关闭并且您的下一个请求之后它也会对您透明地打开。 SO it doesn't really matter if it closed or not.所以它是否关闭并不重要。 It should be transparent to your code.它应该对您的代码透明。 So in short I assume that you can move to HTTP/HTTPS with no problems.所以简而言之,我假设您可以毫无问题地迁移到 HTTP/HTTPS。 But you will have to test and see.但是你必须测试看看。

Also about other options on server to client communications you can look at my answer to this question: How to continues send data from backend to frontend when something changes关于服务器到客户端通信的其他选项,您可以查看我对这个问题的回答: How to continues send data from backend to frontend when something changes

IMHO, you should improve your scheduler to a REST API server, Websocket isn't effective in this scenario, the connection will inactive most of time恕我直言,您应该将调度程序改进为 REST API 服务器,Websocket 在这种情况下无效,大部分时间连接将处于非活动状态

The jobs can be short-lived or long running.这些作业可以是短暂的或长期运行的。 So, When a long running job fails in the middle, how does the restart of the job happen?那么,当一个长时间运行的作业中途失败时,作业的重启是如何发生的呢? Does it start from beginning again?是从头再来吗?

In a similar scenario, we had a database to keep track of the progress of the job (no of records successfully processed).在类似的场景中,我们有一个数据库来跟踪作业的进度(没有成功处理的记录)。 So, the jobs can resume after a failure.因此,作业可以在失败后恢复。 With such a design, another webservice can monitor the status of the job by looking at the database.通过这样的设计,另一个 web 服务可以通过查看数据库来监控作业的状态。 So, the main process is not impacted by constant polling by the client.因此,主进程不受客户端不断轮询的影响。

We have had bad experiences with long standing HTTP/HTTPS connections.我们在长期存在的 HTTP/HTTPS 连接方面有过糟糕的经历。 We used to schedule short jobs (only a couple of minutes) via HTTP and wait for it to finish and send a response.我们过去常常通过 HTTP 安排短期工作(只有几分钟)并等待它完成并发送响应。 This worked fine, until the jobs got longer (hours) and some.network infrastructure closed the inactive connections.这工作正常,直到工作变得更长(小时)并且一些网络基础设施关闭了非活动连接。 We ended up only submitting the request via HTTP, get an immediate response and then implemented a polling to wait for the response.我们最终只通过 HTTP 提交请求,立即得到响应,然后实施轮询以等待响应。 At the time, the migration was pretty quick for us, but since then we have migrated it even further to use "webhooks", eg allow the processor of the job to signal it's state back to the server using a known webhook address.当时,迁移对我们来说非常快,但从那时起我们进一步迁移它以使用“webhooks”,例如允许作业的处理器使用已知的 webhook 地址将其 state 发回服务器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当我们使用servlet3规范中提到的AsyncContext时,http连接保持打开多长时间? - When we use AsyncContext mentioned in servlet3 specification, how long the http connection remains open? 什么是更好的做法:保持fileInputStream长时间打开,或者打开和关闭它很多? - What's better practice : To keep a fileInputStream open for a long time, or open and close it a lot? 使用HttpAsyncClients:长时间保持连接活动 - Using HttpAsyncClients: keep connection alive for long time 远程数据库连接已打开很长时间 - Remote database connection open for long time 异步Servlet如何在不打开更多线程的情况下保持与http客户端的连接 - How asynchronous servlet can keep connection with http clients without open more threads 如何在 Dropwizard/Jersey 中保持长时间运行的 HTTP 连接? - How to keep a long-running HTTP connection alive in Dropwizard/Jersey? 在内存数据库中h2保持连接打开多长时间? - in memory database h2 how long keep connection open? 在SQLite中长时间保持读取事务打开是否可以? - Is it OK to keep a read transaction open for a long time in SQLite? 服务器长时间保持连接时,http请求超时吗? - will http request timeout when server hold the connection for a long time? HTTP Keep-Alive是否可以维持长时间的轮询? - HTTP Keep-Alive can sustain long polling?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM