簡體   English   中英

無法解析幾個xml節點。應用了什么保護?

[英]Unable to parse few xml nodes.What's the protection applied?

我有這樣的XML提要

<item><title>Left hopes BJP surge will eat into Mamata’s votes </title><link>http://timesofindia.feedsportal.com/c/33039/f/533916/s/39439a29/sc/7/l/0Ltimesofindia0Bindiatimes0N0Cindia0CLeft0Ehopes0EBJP0Esurge0Ewill0Eeat0Einto0EMamatas0Evotes0Chome0Clok0Csabha0Celections0C20A140Cnews0CLeft0Ehopes0EBJP0Esurge0Ewill0Eeat0Einto0EMamatas0Evotes0Carticleshow0C336252890Bcms/story01.htm</link><description>At times sworn enemies can be of help to each other, albeit indirectly. In the current political winds of West Bengal, no one knows it better than the Left.&lt;img width='1' height='1' src='http://timesofindia.feedsportal.com/c/33039/f/533916/s/39439a29/sc/7/mf.gif' border='0'/&gt;&lt;br clear='all'/&gt;&lt;br/&gt;&lt;br/&gt;&lt;a href="http://da.feedsportal.com/r/194480044196/u/409/f/533916/c/33039/s/39439a29/sc/7/rc/1/rc.htm" rel="nofollow"&gt;&lt;img src="http://da.feedsportal.com/r/194480044196/u/409/f/533916/c/33039/s/39439a29/sc/7/rc/1/rc.img" border="0"/&gt;&lt;/a&gt;&lt;br/&gt;&lt;a href="http://da.feedsportal.com/r/194480044196/u/409/f/533916/c/33039/s/39439a29/sc/7/rc/2/rc.htm" rel="nofollow"&gt;&lt;img src="http://da.feedsportal.com/r/194480044196/u/409/f/533916/c/33039/s/39439a29/sc/7/rc/2/rc.img" border="0"/&gt;&lt;/a&gt;&lt;br/&gt;&lt;a href="http://da.feedsportal.com/r/194480044196/u/409/f/533916/c/33039/s/39439a29/sc/7/rc/3/rc.htm" rel="nofollow"&gt;&lt;img src="http://da.feedsportal.com/r/194480044196/u/409/f/533916/c/33039/s/39439a29/sc/7/rc/3/rc.img" border="0"/&gt;&lt;/a&gt;&lt;br/&gt;&lt;br/&gt;&lt;a href="http://da.feedsportal.com/r/194480044196/u/409/f/533916/c/33039/s/39439a29/sc/7/a2.htm"&gt;&lt;img src="http://da.feedsportal.com/r/194480044196/u/409/f/533916/c/33039/s/39439a29/sc/7/a2.img" border="0"/&gt;&lt;/a&gt;&lt;img width="1" height="1" src="http://pi.feedsportal.com/r/194480044196/u/409/f/533916/c/33039/s/39439a29/sc/7/a2t.img" border="0"/&gt;</description><pubDate>Fri, 11 Apr 2014 19:26:07 GMT</pubDate><guid isPermaLink="false">http://timesofindia.indiatimes.com/india/Left-hopes-BJP-surge-will-eat-into-Mamatas-votes/home/lok/sabha/elections/2014/news/Left-hopes-BJP-surge-will-eat-into-Mamatas-votes/articleshow/33625289.cms</guid></item>

我正在使用Jaunt API抓取新聞標題和此供稿中的鏈接。

            agent.visit("http://timesofindia.feedsportal.com/c/33039/f/533916/index.rss");
            Elements items=agent.doc.findEach("<item>");
            for(Element item:items)
            {
                headline=item.findFirst("<title>").getText();
                link=item.findFirst("<link>").getText();
                System.out.println("headline:"+headline+"\nlink:"+link+"\n");
            }

現在我獲得了所有的頭條新聞,但鏈接為空!!!!當我刮另外一個報紙訂閱源時,發生了同樣的事情。那個鏈接節點是否有任何特殊的東西(編碼)給出了null或我做錯了什么。

我不確定,但是findFirst可能不處理<link>因為findFirst更面向注釋。 帶有適當查詢的getFirst是否可行?

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM