简体   繁体   中英

want to parse href in ruby on rails using nokogiri

I am using nokogiri as my HTML parser.

<html>
<body>
<form>
<table>
    <tr><td>Some Text</td></tr>
    <tr>
        <td colspan="2" align="center">
            <br />
            <a href="TransportRoom?servlet=CaseSearch.jsp&amp;advancedSearch=Advanced">
                Advanced Search
            </a>
            <br />
            &nbsp;
        </td>
    </tr>
</table>
</form>
</body>
</html>

In this html code I want to parse the "Advance Search" link. This html is saved in variable named doc1

Can anyone help me with this?

Should be as simple as

doc = Nokogiri::HTML(doc1)
href = doc.css("a").first.attr('href')

This is what you want?

First answer is working for me but if there is n number of links than we can manipulate it by this way

 html = Nokogiri::HTML(doc1)

 html.css("a").each do |element|
      if (element.text.strip == 'Advanced Search')
        advance_search_link = element.attr('href')
      end
  end

I would do as below :

require 'nokogiri'

@doc = Nokogiri.HTML <<-eotl
<html>
<body>
<form>
<table>
    <tr><td>Some Text</td></tr>
    <tr>
        <td colspan="2" align="center">
            <br />
            <a href="TransportRoom?servlet=CaseSearch.jsp&amp;advancedSearch=Advanced">
                Advanced Search
            </a>
            <br />
            &nbsp;
        </td>
    </tr>
</table>
</form>
</body>
</html>
eotl

@doc.at_xpath("//a[normalize-space(.)='Advanced Search']")['href']
# => "TransportRoom?servlet=CaseSearch.jsp&advancedSearch=Advanced"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM