[英]Unable to parse value from HTML using jsoup
I'm relatively new to using jsoup, and I can't seem to find the correct query to parse out the value I'm looking for. 我对使用jsoup相对较新,而且似乎找不到正确的查询来解析我正在寻找的值。 The HTML is as follows.
HTML如下。
<img src='http://rootzwiki.com/public/style_images/ginger/t_unread.png' alt='New Replies' /><br />
</a>
</td>
<td class='col_f_content '>
<h4><a id="tid-link-12251" href="http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/" title='View topic, started 17 December 2011 - 09:32 AM' class='topic_title'>[ROM][LTE] RootzBoat 4.0.3 V6.1</a></h4>
<br />
<span class='desc lighter blend_links'>
Started by <a hovercard-ref="member" hovercard-id="5" class="_hovertrigger url fn " href='http://rootzwiki.com/user/5-birdman/'>birdman</a>, 17 Dec 2011
</span>
<ul class='mini_pagination'>
<li><a href="http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/" title='Go to page 1'>1</a></li>
<li><a href="http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/page__st__10" title='Go to page 2'>2</a></li>
<li><a href="http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/page__st__20" title='Go to page 3'>3</a></li>
<li><a href="http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/page__st__1990" title='Go to page 200'>200 →</a></li>
</ul>
</td>
<td class='col_f_preview __topic_preview'>
<a href='http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/' class='expander closed' title='Preview this topic'> </a>
</td>
<td class='col_f_views desc blend_links'>
<ul>
<li>
<span class='ipsBadge ipsBadge_orange'>Hot</span>
<a href="http://rootzwiki.com/index.php?app=forums&module=extras&section=stats&do=who&t=12251" onclick="return ipb.forums.retrieveWhoPosted( 12251 );">1,999 replies</a>
</li>
<li class='views desc'>180,213 views</li>
</ul>
</td>
<td class='col_f_post'>
<a href='http://rootzwiki.com/user/49940-jakeday/' class='ipsUserPhotoLink left'>
<img src='http://rootzwiki.com/uploads/profile/photo-thumb-49940.jpg' class='ipsUserPhoto ipsUserPhoto_mini' />
</a>
<ul class='last_post ipsType_small'>
<li><a hovercard-ref="member" hovercard-id="49940" class="_hovertrigger url fn " href='http://rootzwiki.com/user/49940-jakeday/'>jakeday</a></li>
<li>
<a href='http://rootzwiki.com/topic/12251-romlte-rootzboat-403-v61/page__view__getlastpost' title='Go to last post'>Today, 04:20 AM</a>
</li>
</ul>
</td>
I need to parse out birdman
from there. 我需要从那里解析
birdman
。 I know that once I've defined the element, I can get "birdman" out with author.text();
我知道,一旦定义了元素,就可以使用
author.text();
获得“ author.text();
” author.text();
, but I cant figure out how to define the author element. ,但我不知道如何定义author元素。 I thought perhaps the following block of code would work, but as I mentioned, I'm pretty new to jsoup and html and it obviously didnt work.
我认为也许下面的代码块会起作用,但是正如我提到的那样,我对jsoup和html很陌生,但显然不起作用。 Theres nothing wrong with the connection, and jsoup is working for the other values I parsed out.
连接没有任何问题,并且jsoup正在为我解析出的其他值工作。
TitleResults titleArray = new TitleResults();
Document doc = null;
try {
doc = Jsoup.connect(Constants.FORUM).get();
} catch (IOException e) {
e.printStackTrace();
}
Elements threads = doc.select(".topic_title");
for (Element thread : threads) {
titleArray = new TitleResults();
//Thread title
threadTitle = thread.text();
titleArray.setItemName(threadTitle);
//Thread link
String threadStr = thread.attr("abs:href");
String endTag = "/page__view__getnewpost"; //trim link
threadStr = new String(threadStr.replace(endTag, ""));
threadArray.add(threadStr);
titleArray.setAuthorDate("Author/Date");
results.add(titleArray);
}
Elements authors = doc.select("a[hovercard-ref]");
for (Element author : authors) {
if (author.attr("abs:href").contains("/user/")){
Log.d("POC", "SUCCESS " + author.attr("abs:href"));
} else {
Log.d("POC", "FAILURE " + author.text());
}
}
}
I think you're thinking too hard ;) 我想你想得太辛苦了;)
To get the birdman
portion of the link, just use the following: 要获取链接的
birdman
部分,只需使用以下命令:
Elements authors = doc.select("a");
for (Element author : authors) {
Log.d("POC", author.text());
}
The "a"
retrieves all links. "a"
检索所有链接。 After that you can just use the .text()
like you said to retrieve the value. 之后,您可以只使用
.text()
来获取值。
Selvin answered it in the comments. 塞尔文在评论中回答了。 I wasnt getting the source correctly and it was causing errors.
我没有正确获取源,并且导致了错误。 http://pastebin.com/xfUQkGw0
http://pastebin.com/xfUQkGw0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.