简体   繁体   中英

Nodejs error encoding when get external site's content

I used get method of request module to get content of external site. If encoding of external site is utf-8, it is ok, but it has display error with other encodings such as shift-jis

function getExternalUrl(request, response, url){

    mod_request.get(url, function (err, res, body) {
    //mod_request.get({uri: url, encoding: 'binary'}, function (err, res, body) {
        if (err){
            console.log("\terr=" + err);
        }else{
            var result = res.body;
            // Process res.body
            response.write(result);
        }
        response.end();
    });
}

How can I get content of external site with correct encoding?

I found the way to do:

  1. Get with binary encoding

    var mod_request = require('request');
    mod_request.get({ uri: url, encoding: 'binary', headers: headers }, function(err, res, body) {});

  2. Create a Buffer with binary format

    var contentBuffer = new Buffer(res.body, 'binary');

  3. Get real encoding of page by detect-character-encoding npm

    var mod_detect_character_encoding = require('detect-character-encoding');
    var charsetMatch = mod_detect_character_encoding(contentBuffer);

  4. Convert page to utf-8 by iconv npm

    var mod_iconv = require('iconv').Iconv;
    var iconv = new mod_iconv(charsetMatch.encoding, 'utf-8');
    var result = iconv.convert(contentBuffer).toString();

P/S: This way is only applied for text file (html, css, js). Please do not apply for image file or others which is not text This way is only applied for text file (html, css, js). Please do not apply for image file or others which is not text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM