简体   繁体   中英

PHP exec wget nohup AND write to log in real time

I have a php script that is an AJAX call from a web app. Basically I have a JS event listener that listens to a log (almost like a tail , and prints out it's contents to the screen. This works

The problem is that I can either write to the log using something like tee for real time log creation, but I have to keep the script open and some browsers timeout on long downloads --

or I can nohup and > /dev/null & to get the script to run (just a quick AJAX call) and doesn't keep the script open in the browser.

But I can't seem to do both

DESIRED OUTCOME

Webpage --> ajax call --> ajax return --> shell still running --> js listener looking at wget logs and displays to console in real time

I have tried this ( php ):

 $url = "internaldomain.com/some-download";
 $user_file = "/some/user/directory/";
 exec("nohup wget --random-wait -r -p -e robots=off -P $user_file -U mozilla $url > /dev/null &");

Which works with the AJAX calling and then not waiting .. Good things.

I have tried this as well:

 $url = "internaldomain.com/some-download";
 $user_file = "/some/user/directory/";
 exec("nohup wget --random-wait -r -p -e robots=off -P $user_file -U mozilla $url 2>&1 | tee -a wget_log");

which works in writing to a log file -- But my AJAX still holds the connection.

Is there a happy marriage between the 2 where I can have real time logging (not just outputting to a log at the conclusion of wget ) and start a the shell without waiting for conclusion so the rest of my JS can move on, and start "listening" to the log file?

Then end result with the AJAX listener is to "look" like a real time console -- But web based.

UPDATE We host about 600 websites. This is an internal program that hits the website via port 80 and recursively packages it up (wget). Our support staff gets no indication of what's happening "during" the wget. So I want to start the script via a browser, get the site downloading to a specified directory, and create a "real time" log so that our support staff are apprised of the progress og the wget. Lastly when finished it presents a download link. To understand better what I am trying to accomplish .. Please see the image below:

应用程序用户界面视图

If you want something to happen outside the context of a web request you shouldn't be forking it off of one like this. All it takes is one malicious party, or ever one non-malicious party to hit F5 a few extra times and spawn enough processes to seriously impact or kill the server.

Ideally, I would suggest segregating the user-facing components from those running tasks with a queue system. Eg:

  • Client-facing:
    1. User clicks 'scrape foo.com' button.
    2. The backing application enqueues a 'scrape foo.com' job with a unique ID.
    3. The unique ID is returned to the client to be used to monitor progress.
  • Server-side:
    1. Configure a process manager like supervisord to run one or more worker tasks.
    2. The worker tasks poll the queue for work, and do it.
      • Eg: scrape foo.com and log to log/$unique_id.log

As for the "real time" log output, IMHO true real-time [while possible] would be overkill, and is more difficult than you might think to just continuously output from PHP as there are several troublesome layers of buffering all the from PHP CGI SAPIs, through your httpd, and down into the TCP stack. The simplest method would be to simply trigger a periodic refresh via Javascript, or you could go a step further and keep track of the offset in the logfile that you last ended at and request only bytes after that offset, merging it with what you've already received.

If you're truly hell-bent on true "real-time" updates then you should look into WebSockets, eg: Ratchet , though I would still say that this is overkill for what you want to accomplish.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM