简体   繁体   中英

Node.js convert string from ISO-8859-2 to UTF-8

When I am downloading page content by Node.js Request and the content is encoded by ISO-8859-2 , it is impossible to convert it to UTF-8 .

I am using node-iconv for it.

Code:

request('https://www.jakpsatweb.cz', function(err, resp, body){
    const title = regexToRetrieveTitle(body);
    const iconv = new Iconv('ISO-8859-2', 'UTF-8');
    const buffer = iconv.convert(title);
    console.log(buffer);
    console.log(buffer.toString('UTF8'));
})

Console:

<Buffer 52 65 6b 6c 61 6d 61 3a 20 6a 61 6b 20 66 75 6e 67 75 6a 65 20 77 65 62 6f 76 c4 8f c5 bc cb 9d 20 72 65 6b 6c 61 6d 61>
Reklama: jak funguje webovďż˝ reklama

Expected result:

Reklama: jak funguje webová reklama

Do anyone know where is problem?

EDIT:

For example I download THIS PAGE . I recognised ISO-8859-2 by meta tags (chrome browser also) and I need to convert the content of page and save to database. My Database is UTF-8 therefore I need to encode it.

The conversion from ISO-8859-2 to UTF-8 worked fine. It was the input (the title variable) that has a wrong contents: The title contains the bytes EF BF BD. This means that the title was already UTF-8 encoded, but with a U+FFFD (REPLACEMENT CHARACTER) in the place where you would expect the letter á (LATIN SMALL LETTER A WITH ACUTE).

Now, the original web page https://www.jakpsatweb.cz/reklama/index.html is correctly encoded in ISO-8859-2 and also has the required charset declaration in the <head> section.

Therefore the problem must be in the software that downloads the web page (NodeJS) or the regexToRetrieveTitle function.

The problem is in Node.js request. There is encoding set to UTF8 by default. I had to set it to null and now everything works fine.

request({ uri: 'https://www.jakpsatweb.cz', encoding: null}, function(err, resp, body){
    .....
})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM