How could I use ruby to extract information from a table consisting of these rows? Is it possible to detect the comments using nokogiri?
<!-- Begin Topic Entry 4134 -->
<tr>
<td align="center" class="row2"><image src='style_images/ip.boardpr/f_norm.gif' border='0' alt='New Posts' /></td>
<td align="center" width="3%" class="row1"> </td>
<td class="row2">
<table class='ipbtable' cellspacing="0">
<tr>
<td valign="middle"><alink href='http://www.xxx.com/index.php?showtopic=4134&view=getnewpost'><image src='style_images/ip.boardpr/newpost.gif' border='0' alt='Goto last unread' title='Goto last unread' hspace=2></a></td>
<td width="100%">
<div style='float:right'></div>
<div> <alink href="http://www.xxx.com/index.php?showtopic=4134&hl=">EXTRACT LINK 1</a> </div>
</td>
</tr>
</table>
<span class="desc">EXTRACT DESCRIPTION</span>
</td>
<td class="row2" width="15%"><span class="forumdesc"><alink href="http://www.xxx.com/index.php?showforum=19" title="Living">EXTRACT LINK 2</a></span></td>
<td align="center" class="row1" width='10%'><alink href='http://www.xxx.com/index.php?showuser=1642'>Mr P</a></td>
<td align="center" class="row2"><alink href="javascript:who_posted(4134);">1</a></td>
<td align="center" class="row1">46</td>
<td class="row1"><span class="desc">Today, 12:04 AM<br /><alink href="http://www.xxx.com/index.php?showtopic=4134&view=getlastpost">Last post by:</a> <b><alink href='http://www.xxx.com/index.php?showuser=1649'>underft</a></b></span></td>
</tr>
<!-- End Topic Entry 4134 -->
-->
Try to use xpath
instead:
html_doc = Nokogiri::HTML("<html><body><!-- Begin Topic Entry 4134 --></body></html>")
html_doc.xpath('//comment()')
You could implement a Nokogiri SAX Parser . This is done faster than it might seem at first sight. You get events for Elements, Attributes and Comments.
Within your parser, your should rememeber the state, like @currently_interested = true to know which parts to rememeber and which not.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.