简体   繁体   中英

Generate Xpath from parsed HTML with Ruby on Rails

Given the following example HTML:

<table cellpadding="4" cellspacing="0" border="0" width="100%">
  <tbody>
    <tr bgcolor="#FFE4D8" valign="top">
    <td>in the next 20 minutes you will learn how to create a winter landscape. For this excersize you do need to have only a basic experience in Lightwave, so lets just start with it.<br>
  </tbody>
</table>

How could I auto generate an Xpath expression to the tag that contains "20 minutes"; in the same manner that Firepath does. Is this possible to do from within Ruby?

Assuming the text is not broken into different tags, you could find the lowest leaf node by

//*[contains(text(),'20 minutes')]

You can then generate the string of parents by adding /.. at the end of the XPath until you got the root element html . At every step you will also need to get the position of the element by

//*[contains(text(),'20 minutes')]/position()

and for higher elements

//*[contains(text(),'20 minutes')]/../position()

After you know each tag name and position, you can build the path

/html[1]/body[1]/div[x]/table[y]/tbody[z]/tr[1]/td[1]

With x, y, z being placeholders.

Since I don't know ruby, I cannot provide source code, but this will be an easy algorithm. The good thing is that you can implement this with any DOM parser, that knows XPath. It may be possible to optimize it considerably, if the DOM parser has a method for returning the parent of a node, because selecting the parent in XPath on its own for every step is slow and not viable for many/long documents.

It looks like REXML for ruby support the parent() method.

You can try to build xpath with the jini library yourself.

xpath = Jini.new('parent')
            .add_path('child')
            .add_attr('key', 'value)
            .to_s
puts xpath // parent/child[@key="value"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM