简体   繁体   English

nodejs exec Wget命令

[英]nodejs exec Wget command

I'm writing a nodejs application for download entire web sites using “wget” unix command, but I have a problem with some urls inside the downloaded pages, .html appeares at the end of the files eg 我正在编写一个nodejs应用程序,以使用“ wget” unix命令下载整个网站,但是下载的页面内部存在一些URL问题,.html出现在文件末尾,例如

<img src=“images/photo.jpeg.html”> or <script src=“js/scripts.js.html”>

The code i'm using is the following: 我正在使用的代码如下:

    var util = require('util'),
    exec = require('child_process').exec,
    child,
    url = 'http://www.example.com/';
child = exec('wget --mirror -p --convert-links --html-extension -e robots=off -P /destination_folder/ ' + url,
  function (error, stdout, stderr) {
    console.log('stdout: ' + stdout);
    console.log('stderr: ' + stderr);
    if (error !== null) {
      console.log('exec error: ' + error);
    }
});

NB If i use this command (wget --mirror -p --html-extension --convert-links -e robots=off -P . http://www.example.com ) directly on the Unix shell it works correctly. 注意:如果我直接在Unix shell上使用此命令(wget --mirror -p --html-extension --convert-links -e robots = off -P。http: //www.example.com ),它将正常工作。

Edit : this is the log returned after running the nodejs script: 编辑 :这是运行nodejs脚本后返回的日志:

--2017-04-04 11:49:49--  http://www.example.com/css/style.min.css
Reusing existing connection to www.example.com:80.
HTTP request sent, awaiting response... 304 Not Modified
File ‘/destination_folder/www.example.com/css/style.min.css.html’ not modified on server. Omitting download.

FINISHED --2017-04-04 11:50:11--
Total wall clock time: 22s
Downloaded: 50 files, 1.2M in 1.4s (855 KB/s)
/destination_folder/www.example.com/css/style.min.css.html: No such file or directory
Converting links in /destination_folder/www.example.com/css/style.min.css.html... nothing to do.
exec error: Error: stderr maxBuffer exceeded

I don't understand where is the problem, could you help me please? 我不明白问题出在哪里,请您能帮我吗?

Thank you 谢谢

exec uses a buffer between stdout and sterr which is limited. execstdoutsterr之间使用一个有限的缓冲区。

If the files to download are big the buffer may run out of space. 如果要下载的文件很大,则缓冲区可能空间不足。 Try using spawn intestad of exec . 尝试使用exec spawn intestad。 For your reference: Difference between spawn and exec of Node.js 供您参考: spawn与Node.js的exec之间的区别

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM