I'm writing a nodejs application for download entire web sites using “wget” unix command, but I have a problem with some urls inside the downloaded pages, .html appeares at the end of the files eg
<img src=“images/photo.jpeg.html”> or <script src=“js/scripts.js.html”>
The code i'm using is the following:
var util = require('util'),
exec = require('child_process').exec,
child,
url = 'http://www.example.com/';
child = exec('wget --mirror -p --convert-links --html-extension -e robots=off -P /destination_folder/ ' + url,
function (error, stdout, stderr) {
console.log('stdout: ' + stdout);
console.log('stderr: ' + stderr);
if (error !== null) {
console.log('exec error: ' + error);
}
});
NB If i use this command (wget --mirror -p --html-extension --convert-links -e robots=off -P . http://www.example.com ) directly on the Unix shell it works correctly.
Edit : this is the log returned after running the nodejs script:
--2017-04-04 11:49:49-- http://www.example.com/css/style.min.css
Reusing existing connection to www.example.com:80.
HTTP request sent, awaiting response... 304 Not Modified
File ‘/destination_folder/www.example.com/css/style.min.css.html’ not modified on server. Omitting download.
FINISHED --2017-04-04 11:50:11--
Total wall clock time: 22s
Downloaded: 50 files, 1.2M in 1.4s (855 KB/s)
/destination_folder/www.example.com/css/style.min.css.html: No such file or directory
Converting links in /destination_folder/www.example.com/css/style.min.css.html... nothing to do.
exec error: Error: stderr maxBuffer exceeded
I don't understand where is the problem, could you help me please?
Thank you
exec
uses a buffer between stdout
and sterr
which is limited.
If the files to download are big the buffer may run out of space. Try using spawn
intestad of exec
. For your reference: Difference between spawn
and exec
of Node.js
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.