[英]nodejs exec Wget command
I'm writing a nodejs application for download entire web sites using “wget” unix command, but I have a problem with some urls inside the downloaded pages, .html appeares at the end of the files eg 我正在编写一个nodejs应用程序,以使用“ wget” unix命令下载整个网站,但是下载的页面内部存在一些URL问题,.html出现在文件末尾,例如
<img src=“images/photo.jpeg.html”> or <script src=“js/scripts.js.html”>
The code i'm using is the following: 我正在使用的代码如下:
var util = require('util'),
exec = require('child_process').exec,
child,
url = 'http://www.example.com/';
child = exec('wget --mirror -p --convert-links --html-extension -e robots=off -P /destination_folder/ ' + url,
function (error, stdout, stderr) {
console.log('stdout: ' + stdout);
console.log('stderr: ' + stderr);
if (error !== null) {
console.log('exec error: ' + error);
}
});
NB If i use this command (wget --mirror -p --html-extension --convert-links -e robots=off -P . http://www.example.com ) directly on the Unix shell it works correctly. 注意:如果我直接在Unix shell上使用此命令(wget --mirror -p --html-extension --convert-links -e robots = off -P。http: //www.example.com ),它将正常工作。
Edit : this is the log returned after running the nodejs script: 编辑 :这是运行nodejs脚本后返回的日志:
--2017-04-04 11:49:49-- http://www.example.com/css/style.min.css
Reusing existing connection to www.example.com:80.
HTTP request sent, awaiting response... 304 Not Modified
File ‘/destination_folder/www.example.com/css/style.min.css.html’ not modified on server. Omitting download.
FINISHED --2017-04-04 11:50:11--
Total wall clock time: 22s
Downloaded: 50 files, 1.2M in 1.4s (855 KB/s)
/destination_folder/www.example.com/css/style.min.css.html: No such file or directory
Converting links in /destination_folder/www.example.com/css/style.min.css.html... nothing to do.
exec error: Error: stderr maxBuffer exceeded
I don't understand where is the problem, could you help me please? 我不明白问题出在哪里,请您能帮我吗?
Thank you 谢谢
exec
uses a buffer between stdout
and sterr
which is limited. exec
在stdout
和sterr
之间使用一个有限的缓冲区。
If the files to download are big the buffer may run out of space. 如果要下载的文件很大,则缓冲区可能空间不足。 Try using spawn
intestad of exec
. 尝试使用exec
spawn
intestad。 For your reference: Difference between spawn
and exec
of Node.js 供您参考: spawn
与Node.js的exec
之间的区别
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.