使用ruby和nokogiri使用HTML注释作为标记来解析HTML

Question

How could I use ruby to extract information from a table consisting of these rows? 如何使用ruby从包含这些行的表中提取信息？ Is it possible to detect the comments using nokogiri? 是否可以使用nokogiri检测到评论？

<!-- Begin Topic Entry 4134 --> 
    <tr> 
        <td align="center" class="row2"><image src='style_images/ip.boardpr/f_norm.gif' border='0'  alt='New Posts' /></td> 
        <td align="center" width="3%" class="row1">&nbsp;</td> 
        <td class="row2"> 
            <table class='ipbtable' cellspacing="0"> 
                <tr> 

<td valign="middle"><alink href='http://www.xxx.com/index.php?showtopic=4134&amp;view=getnewpost'><image src='style_images/ip.boardpr/newpost.gif' border='0'  alt='Goto last unread' title='Goto last unread' hspace=2></a></td> 

                    <td width="100%"> 
                    <div style='float:right'></div> 
                    <div> <alink href="http://www.xxx.com/index.php?showtopic=4134&amp;hl=">EXTRACT LINK 1</a>  </div> 
                    </td> 
                </tr> 
            </table> 
            <span class="desc">EXTRACT DESCRIPTION</span> 
        </td> 
        <td class="row2" width="15%"><span class="forumdesc"><alink href="http://www.xxx.com/index.php?showforum=19" title="Living">EXTRACT LINK 2</a></span></td> 
        <td align="center" class="row1" width='10%'><alink href='http://www.xxx.com/index.php?showuser=1642'>Mr P</a></td> 
        <td align="center" class="row2"><alink href="javascript:who_posted(4134);">1</a></td> 
        <td align="center" class="row1">46</td> 
        <td class="row1"><span class="desc">Today, 12:04 AM<br /><alink href="http://www.xxx.com/index.php?showtopic=4134&amp;view=getlastpost">Last post by:</a> <b><alink href='http://www.xxx.com/index.php?showuser=1649'>underft</a></b></span></td> 
    </tr> 
<!-- End Topic Entry 4134 -->
-->

Answer 1

Try to use xpath instead: 尝试使用xpath代替：

html_doc = Nokogiri::HTML("<html><body><!-- Begin Topic Entry 4134 --></body></html>") 
html_doc.xpath('//comment()')

Answer 2

You could implement a Nokogiri SAX Parser . 您可以实现Nokogiri SAX Parser 。 This is done faster than it might seem at first sight. 这样做的速度比乍看之下要快。 You get events for Elements, Attributes and Comments. 您将获得有关元素，属性和注释的事件。

Within your parser, your should rememeber the state, like @currently_interested = true to know which parts to rememeber and which not. 在解析器中，您应该记住状态，例如@currently_interested = true，以了解需要记住的部分，而不要记住的部分。

使用ruby和nokogiri使用HTML注释作为标记来解析HTML

问题描述

2 个解决方案

解决方案1
7 2011-10-14 16:07:07

解决方案2
0 已采纳 2009-07-25 12:03:49

使用ruby和nokogiri使用HTML注释作为标记来解析HTML

问题描述

2 个解决方案

解决方案1 7 2011-10-14 16:07:07

解决方案2 0 已采纳 2009-07-25 12:03:49

解决方案1
7 2011-10-14 16:07:07

解决方案2
0 已采纳 2009-07-25 12:03:49