简体   繁体   中英

Extracting data from rss containing <![CDATA[]]> with PHP

This is a description item I get from the rss feed:

        <description><![CDATA[ <img src="http://images.24ur.com/media/images/210/Nov2012/61090877.jpg" alt="24ur.com"/>
        Na sedeĹžu Evropske nogometne zveze v Nyonu so izĹžrebali pare osmine finala Lige prvakov. BrĹžkone bo najbolj vroÄe v Madridu, kjer se bo zasedba Reala uvodoma udarila z Manchester Unitedom, povratni dvoboj pa bosta velikana evropskega nogometa odigrala v Manchestru.]]></description>

It contains this CDATA tag which cannot be parsed with xml parser. if I

echo $test->description;

I see the img in the browser, but I cannot access the src in the script. Any idea how to do it??

You can not access XML inside CDATA section as XML.
You need to parse it with regular expression to fetch the src .
Or open it as another XML.

Tested & works:

$h = '<img src="http://images.24ur.com/media/images/210/Nov2012/61090877.jpg" alt="24ur.com"/>';

preg_match("/http:\/\/(.*?)[^\"']+/", $h, $matches);
var_dump($matches[0]);

Outputs:

string(60) "http://images.24ur.com/media/images/210/Nov2012/61090877.jpg" 

The description contains a single text node (the data of which is a piece of HTML). It doesn't contain any XML elements.

If you want to extract data from the HTML, you need to pass the data of the text node through an HTML parser first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM