[英]nodejs using cheerio parsing xml returns empty CDATA
I am using cheerio in nodejs to parse some rss feeds. 我在nodejs中使用cheerio来解析一些rss feed。 I am grabbing all the items putting them into an array. 我抓住所有将它们放入阵列的物品。 I am using 3 test feeds, all of them have a "description" child element for each "item" element. 我使用3个测试源,它们都为每个“item”元素都有一个“description”子元素。 In one of the feeds the whole "description" is wrapped as CDATA, and I cant get its value. 在其中一个Feed中,整个“描述”被包装为CDATA,我无法获得它的价值。 Here is an abbreviated code snippet 这是一个缩写的代码片段
//Open the xml document with cheerio
$ = cheerio.load(arrXmlDocs[i],{ ignoreWhitespace : true, xmlMode : true});
//Loop through every item
$('item').each(function(i, xmlItem){
//array to hold each item being converted into an array
var tempArray = [];
//Loop through each child of <item>
$(xmlItem).children().each(function(i, xmlItem){
//Get the name
tempArray[$(this)[0].name] = $(this).text();
}
}
As expected the two rss feeds that dont have CDATA give me an array like this 正如所料,没有CDATA的两个RSS提供给我一个像这样的数组
[
[
name: 'name of episode',
description:'description of episode',
pubdate: 'published date'
],
[
name: 'name of episode',
description:'description of episode',
pubdate: 'published date'
]
]
and the feed with the CDATA description looks like this 并且具有CDATA描述的Feed看起来像这样
[
name: 'name of episode',
pubdate: 'published date'
],
So my question is: Why is cheerio not returning values wrapped in CDATA / how can I make it return those values. 所以我的问题是:为什么cheerio没有返回CDATA中包含的值/如何让它返回这些值。
This is a known issue ( related ) with cheerio. 这是与cheerio 有关 的已知问题 ( 相关 )。 It is unable to create a correct tree out of XML with CDATA
in your case yet. 在您的情况下,它无法使用CDATA
从XML中创建正确的树。 I know this is a disappointing answer, it's WIP. 我知道这是一个令人失望的答案,它是WIP。
It is being worked on, meanwhile, you can remove CDATA
with a Regular Expression. 正在进行中,您可以使用正则表达式删除CDATA
。
arrXmlDocs[i].replace(/<!\[CDATA\[([\s\S]*?)\]\]>(?=\s*<)/gi, "$1");
Here is a link to an example jsfiddle . 这是一个示例jsfiddle的链接。
While this is not an ideal solution, it should suffice until they work this issue out. 虽然这不是一个理想的解决方案,但它应该足以解决这个问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.