简体   繁体   English

使用Node.js的RSS编码问题

[英]Encoding problems with RSS using Node.js

I have a trouble reading RSS feeds that uses 'special' characters, for example, this feed is Spanish and contains characters like á, é, í, ó, ú, ü, ç, ñ... when I open it in a browser (Chrome, in my case) that characters are shown correctly. 我在阅读使用“特殊”字符的RSS提要时遇到了麻烦,例如, 提要是西班牙语,并且在浏览器中打开时包含á,é,í,ó,ú,ü,ç,ñ等字符。 (以Chrome为例)正确显示了字符。

Now, I'm trying to read this feed using the request library on Node.js. 现在,我正在尝试使用Node.js上的request库阅读此提要。 This is my code: 这是我的代码:

const rq = require('request');

module.exports.request = (url, method, json, body, headers) => new Promise((resolve, reject) =>
  rq({
    url,
    method,
    json,
    body,
    headers
  }, (error, response, body) => {
    if (error) {
      reject(error);
    } else {
      resolve(body);
    }
  })
);

let feed = rq(URL_HERE, 'GET', false, undefined, HEADERS_HERE)

In this code, I've tried usin HTTP headers like Content-Type: application/rss+xml; charset=utf-8 在这段代码中,我尝试使用HTTP头(例如Content-Type: application/rss+xml; charset=utf-8 Content-Type: application/rss+xml; charset=utf-8 to force the page to render in UTF-8 (this encoding supports that characters) but nothing, when the request is recieved special characters are shown with a ? Content-Type: application/rss+xml; charset=utf-8强制页面以UTF-8呈现(此编码支持该字符),但是什么都没有,当收到请求时,特殊字符显示为? symbol. 符号。

The printing in the console isn't the problem either, because that feed data is directly saved into Firebase Firestore database and I have that ? 控制台中的打印也不是问题,因为该供稿数据直接保存到Firebase Firestore数据库中,我有? in the database too. 在数据库中。

I've tried libraries and methods like utf8_encode , utf8_decode , iconv and the same result, special characters with ? 我已经尝试过库和方法,例如utf8_encodeutf8_decodeiconv和相同的结果,带有?特殊字符 symbol. 符号。

What I'm thinking is, the RSS XML enconding header shows ISO-8859-1 and I'm trying to force the web to be parsed using UTF-8 and is not working correctly, but why is shown correctly in the browser ? 我在想的是,RSS XML enconding标头显示ISO-8859-1并且我试图强制使用UTF-8解析Web,但无法正常工作,但是为什么在浏览器中正确显示呢?

Thanks! 谢谢!

EDIT 编辑

Some results to clarify comments, the expected result should be: 一些结果可以澄清评论,预期结果应为:

Las banderas del Ayuntamiento ondearán mañana a media asta.

1: A simple request without decoding 1:没有解码的简单请求

Code: 码:

const request = require('request');

const myRequest = (url, method, json, body, headers) => new Promise((resolve, reject) =>
  request({
    url,
    method,
    json,
    body,
    headers
  }, (error, response, body) => {
    if (error) {
      reject(error);
    } else {
      resolve(body);
    }
  })
);

myRequest('http://www.barakaldo.org/portal/html/rss/noticias/search.jsp?languageId=es_ES', 'GET')
  .then((feed) => console.log(feed))
  .catch((error) => console.error(error));

Result: 结果:

Las banderas del Ayuntamiento ondear�n ma�ana a media asta.

2: Same request but decoding from latin1 and iso-8859-1 using iconv-lite 2:相同的请求,但使用iconv-litelatin1iso-8859-1进行解码

Code: 码:

const request = require('request');
const iconv = require('iconv-lite');

const myRequest = (url, method, json, body, headers) => new Promise((resolve, reject) =>
  request({
    url,
    method,
    json,
    body,
    headers
  }, (error, response, body) => {
    if (error) {
      reject(error);
    } else {
      resolve(body);
    }
  })
);

myRequest('http://www.barakaldo.org/portal/html/rss/noticias/search.jsp?languageId=es_ES', 'GET')
  .then((feed) => {
    let decodedFeed = iconv.decode(Buffer.from(feed), 'latin1');
    console.log(decodedFeed);
  })
  .catch((error) => console.error(error));

Result: 结果:

Las banderas del Ayuntamiento ondear�n ma�ana a media asta.

The feed is encoded in ISO-8859-1, so that's the encoding you need to decode it. 提要使用ISO-8859-1进行编码,因此您需要对其进行解码。

Putting a Content-Type header field on the request won't have an effect; 将Content-Type标头字段放在请求上不会产生任何影响; it would describe the type of the request body (which is empty for GET). 它将描述请求主体的类型(对于GET为空)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM