简体   繁体   中英

How to match certain text in returned XPath HTML?

I am using Xpath in Ruby with following statement.

print XPath.first(Document.new(html),"//tr[@id='ctl00_c1_rr_ci_trAdd']//td[2]") 

The Query return the following text.

<td>

                1371 N Belsay Rd<br/>Burton, MI 48509
                <br/>
                <a href='http://www.mapquest.com/maps/map.adp?style=2&amp;address=1371+N+Belsay+Rd&amp;city=Burton&amp;state=MI&amp;zip=48509' class='rptLnk2' id='ctl00_c1_rr_ci_hlMapQuest' target='_blank'>See the location on a Mapquest Map</a>
                <br/>
                <a href='http://maps.google.com?q=1371+N+Belsay+Rd Burton, MI 48509' class='rptLnk2' id='ctl00_c1_rr_ci_hlGoogleMaps' target='_blank'>See the location on a Google Map</a>
            </td>

But I just want this text

1371 N Belsay Rd<br/>Burton, MI 48509

Can anyone tell me how to achieve this? When I am using scan statement - I am getting this error.

private method `scan' called for <td> ... </>:REXML::Element (NoMethodError)

An XPath expression to get this text 1371 N Belsay Rd -- as a text node, is:

((//tr[@id='ctl00_c1_rr_ci_trAdd'])//td)[2]/text()[1]

In case you want the expression to select the three nodes:

1371 N Belsay Rd<br/>Burton, MI 48509

you may use this one:

normalize-space(((//tr[@id='ctl00_c1_rr_ci_trAdd'])//td)
                              [2]
                                /node()[not(position() > 3)])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM