I used get
method of request
module to get content of external site. If encoding of external site is utf-8, it is ok, but it has display error with other encodings such as shift-jis
function getExternalUrl(request, response, url){
mod_request.get(url, function (err, res, body) {
//mod_request.get({uri: url, encoding: 'binary'}, function (err, res, body) {
if (err){
console.log("\terr=" + err);
}else{
var result = res.body;
// Process res.body
response.write(result);
}
response.end();
});
}
How can I get content of external site with correct encoding?
I found the way to do:
Get with binary
encoding
var mod_request = require('request');
mod_request.get({ uri: url, encoding: 'binary', headers: headers }, function(err, res, body) {});
Create a Buffer
with binary
format
var contentBuffer = new Buffer(res.body, 'binary');
Get real encoding of page by detect-character-encoding
npm
var mod_detect_character_encoding = require('detect-character-encoding');
var charsetMatch = mod_detect_character_encoding(contentBuffer);
Convert page to utf-8
by iconv
npm
var mod_iconv = require('iconv').Iconv;
var iconv = new mod_iconv(charsetMatch.encoding, 'utf-8');
var result = iconv.convert(contentBuffer).toString();
P/S: This way is only applied for text file (html, css, js). Please do not apply for image file or others which is not text
This way is only applied for text file (html, css, js). Please do not apply for image file or others which is not text
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.