简体   繁体   English

如何在远程服务器上的php中实现脚本执行管理器

[英]How to implement a manager of scripts execution in php on a remote server

I'm trying to build a service that will collect some data form web at certain intervals, then parse those data, finally upon result of parse - execute dedicated procedures. 我正在尝试构建一个服务,该服务将按一定间隔从Web收集一些数据,然后解析这些数据,最后根据解析结果-执行专用过程。 Typical schematic of service run: 服务运行的典型示意图:

  1. Request item list to be updated to 请求项目列表更新为
  2. Download data of listed items 下载列出项目的数据
  3. Check what's not updated yet 检查尚未更新的内容
  4. Update database 更新数据库
  5. Filter data that contains updates (get only highest priority updates) 筛选包含更新的数据(仅获取优先级最高的更新)
  6. Perform some procedures to parse updates 执行一些过程来解析更新
  7. Filter data that contains updates (get only medium priority updates) 筛选包含更新的数据(仅获取中优先级更新)
  8. Perform some procedures to parse ... ... ... 执行一些过程来解析……

Everything would be simple if there ware not so many data to be updated. 如果没有太多要更新的数据,一切将变得简单。 There is so many data to be updated that at every step from 1 to 8 (maybe besides 1) scripts will fail due to restriction of 60 sec max execution time. 由于要更新的​​数据太多,因此由于最长执行时间限制为60秒,因此从1到8(也许除了1)的每一步脚本都会失败。 Even if there was an option to increase it this would not be optimal as the primary goal of the project is to deliver highest priority data as first. 即使可以增加它,这也不是最佳选择,因为该项目的主要目标是首先提供最高优先级的数据。 Unlucky defining priority level of an information is based on getting majority of all data and doing lot of comparisons between already stored data and incoming (update) data. 不幸的是,信息的优先级定义是基于获取所有数据的大部分,并对已存储的数据与传入的(更新)数据进行大量比较。

I could resign from the service speed to get at least high priority updates in exchange and wait longer time for all the other. 我可以放弃服务速度,以换取至少高优先级的更新以换取其他时间。 I thought about writing some parent script (manager) to control every step (1-8) of service, maybe by executing other scripts? 我考虑过编写一些父脚本(经理)来控制服务的每个步骤(1-8),也许通过执行其他脚本来控制? Manager should be able to resume unfinished step (script) to get it completed. 经理应该能够恢复未完成的步骤(脚本)以使其完成。 It is possible to write every step in that way that it will do some small portion of code and after finishing it mark this small portion of work as done in ie SQL DB. 可以用这样的方式编写每个步骤,使其只执行一小部分代码,并在完成后将这部分工作标记为在SQL DB中完成。 after manager's resuming, step (script) will continue form the point it was terminated by server due to exceeding max exec. 经理恢复后,步骤(脚本)将从由于超过最大执行次数而被服务器终止的位置继续执行。 time. 时间。

Known platform restrictions: remote server, unchangeable max execution time, usually limit to parse one script at the same time, lack of the access to many apache features, and all the other restrictions typical to remote servers 已知的平台限制:远程服务器,最大执行时间不变,通常会限制同时解析一个脚本,无法访问许多apache功能,而远程服务器通常会遇到所有其他限制

Requirements: Some kind of manager is mandatory as besides calling particular scripts this parent process must write some notes about scripts that ware activated. 要求:某种管理器是强制性的,因为除了调用特定脚本外,该父进程还必须写一些有关已激活的脚本的注释。

Manager can be called by crul, one minute interval is enough. 经理可以残酷地打电话,一分钟的间隔就足够了。 Unlucky, making for curl a list of calls to every step of service is not an option here. 不幸的是,在这里不能卷曲到服务的每个步骤的呼叫列表。

I also considered getting new remote host for every step of service and control them by another remote host that could call them and ask for doing their job by using ie SOAP but this scenario is at the end of my list of wished solutions because it does not solve problem of max execution time and brings lot of data exchange over global net witch is the slowest way to work on data. 我还考虑过为服务的每个步骤获取一个新的远程主机,并由另一个可以调用它们并要求使用SOAP进行工作的远程主机控制它们,但是这种情况在我希望的解决方案列表的最后,因为它没有解决最大执行时间的问题并通过全局网络带来大量数据交换是最慢的数据处理方式。

Any thoughts about how to implement solution? 关于如何实施解决方案的任何想法?

I don't see how steps 2 and 3 by themself can execute over 60 seconds. 我不知道第2步和第3步本身如何能在60秒内执行。 If you use curl_multi_exec for step 2, it will run in seconds. 如果您在步骤2中使用curl_multi_exec,它将在几秒钟内运行。 If you are getting your script over 60 seconds at step 3, you would get "memory limit exceeded" instead and a lot earlier. 如果在第3步中获得60秒以上的脚本,则会出现“超出内存限制”的情况,并且提早了很多。

All that leads me to a conclusion, that the script is very unoptimized. 所有这一切使我得出结论,该脚本是非常没有优化。 And the solution would be to: 解决方案是:

  1. break the task into (a) what to update and save that in database (say flag 1 for what to update, 0 for what not to); 将任务分解为(a)更新内容并将其保存在数据库中(对于更新内容,标记1;对于不更新内容,标记0); (b) cycle through rows that needs update and update them, setting flag to 0. At ~50 seconds just shut down (assuming that script is run every few minutes, that will work). (b)循环浏览需要更新和更新的行,将标志设置为0。在大约50秒时关闭(假设脚本每隔几分钟运行一次,那将起作用)。

  2. get a second server and set it up with a proper execution time to run your script for hours. 获取第二台服务器并设置适当的执行时间,以使您的脚本运行数小时。 Since it will have access to your first database (and not via http calls), it won't be a major traffic increase. 由于它将可以访问您的第一个数据库(而不是通过http调用),因此访问量不会有太大的增加。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM