简体   繁体   中英

Extract CDATA from RSS XML using Javascript

I have extracted RSS feed content using JS, however the 'Description' node contains CDATA and I want to split this out.

For example, for each Description node under Item I would like to extract only the content that is from <b>Brief Description:</b> to the first </div> .

Is this possible? Below is an exmaple of what I have thus far and also the xml from the RSS feed below.

Hope someone can help :)

Script Example

<SCRIPT type=text/javascript>
if (window.XMLHttpRequest)
  {// code for IE7+, Firefox, Chrome, Opera, Safari
  xmlhttp=new XMLHttpRequest();
  }
else
  {// code for IE6, IE5
  xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
  }

xmlhttp.open("GET","help/Sandbox/XML%20Playground/_layouts/listfeed.aspx?List=%7B1D503F3E%2D4BFF%2D4248%2D848D%2DE12B5B67DAEC%7D",false);
xmlhttp.send();
xmlDoc=xmlhttp.responseXML;



function media(){

description=xmlDoc.getElementsByTagName('description');
a=2;
b=1;

for (i=0;i<18;i++)
{



document.write('<p>' + description[b].childNodes[0].nodeValue + '</p>');

b++;
a++;

};

};



</SCRIPT>

RSS XML FEED

<?xml version="1.0" encoding="UTF-8"?>
<!--RSS generated by Windows SharePoint Services V3 RSS Generator on 8/03/2011 10:51:51 AM-->
<?xml-stylesheet type="text/xsl" href="/help/Sandbox/XML Playground/_layouts/RssXslt.aspx?List=1d503f3e-4bff-4248-848d-e12b5b67daec" version="1.0"?>
<rss version="2.0">
  <channel>
    <title>XML Playground: Media News</title>
    <link>/help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    <description>RSS feed for the Media News list.</description>
    <lastBuildDate>Mon, 07 Mar 2011 23:51:51 GMT</lastBuildDate>
    <generator>Windows SharePoint Services V3 RSS Generator</generator>
    <ttl>60</ttl>
    <image>
      <title>XML Playground: Media News</title>
      <url>/help/Sandbox/XML Playground/_layouts/images/homepage.gif</url>
      <link>help/Sandbox/XML Playground/Lists/Media News/AllItems.aspx</link>
    </image>
    <item>
      <title>new Item</title>
      <link>/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>bla blah blah ablkahgohoihjsdofsdf dfhfgh</div></div>
<div><b>Thumbnail:</b> <a href="/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif">test image</a></div>
]]></description>
      <author>WALKER,Andrew</author>
      <pubDate>Mon, 07 Mar 2011 05:43:19 GMT</pubDate>
      <guid isPermaLink="true">http:/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=2</guid>
    </item>
    <item>
      <title>My School 2.0 launched</title>
      <link>http://dnet.hosts.network/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</link>
      <description><![CDATA[<div><b>Brief Description:</b> <div>On Friday 4 March 2011 the Minister for School Education, Peter Garrett, launched My School 2.0.</div></div>
<div><b>Thumbnail:</b> <a href="http://dnet.hosts.network/news/PublishingImages/MySchool_rollup-120-x-120_new-040311.gif"></a></div>
<div><b>Release Date:</b> 16/03/2011</div>
]]></description>
                <pubDate>Fri, 04 Mar 2011 04:34:11 GMT</pubDate>
      <guid isPermaLink="true">/help/Sandbox/XML Playground/Lists/Media News/DispForm.aspx?ID=1</guid>
    </item>
  </channel>
</rss>

CDATA section content is just text, so you can't parse its contents further using the DOM. You can either use DOMParser() to reconstruct the string contents of the CDATA section back into XML and use DOM methods from there, or else use regular expressions.

To use the latter approach, change your document.write() line to this:

// Slice off 5 characters to get rid of the parent <div> and use [\s\S] to mean
//   any character including newlines up until the first closing div tag
document.write('<p>' + description[b].childNodes[0].nodeValue.slice(5).match(/[\s\S]*?<\/div>/) + '</p>');

To use the former approach, which is less than ideal in this case but could be helpful in other situations, you could do this inside the for loop:

var cdataContent = new DOMParser().parseFromString('<div xmlns="http://www.w3.org/1999/xhtml">'+description[b].childNodes[0].nodeValue+'</div>', 'text/xml').documentElement;
document.body.appendChild(cdataContent.firstChild);

...but being sure to only invoke media() after the DOM content has loaded.

And maybe you have some good reason for it, but based on the code you supplied, it'd be a lot simpler just to do this:

for (i=1; i<description.length; i++) {

...and forget about a and b (ie, change b to i)

And one tip: if you construct the RSS yourself, note that you won't be able to use CDATA sections nested within CDATA sections.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM