How to get all xpaths that are matching given regex?

Question

Is there any python library which facilitates in getting xpaths of dom nodes which matches the given regex?

I am trying to fetch question and answer pair from a faq page

these are three different xpaths of questions from this site

xpath1: /html/body/div[1]/div[2]/div[3]/div[2]/div/div[2]/div/div[1]/div/div[7]/div[1]/a/span
xpath2: /html/body/div[1]/div[2]/div[3]/div[2]/div/div[2]/div/div[1]/div/div[10]/div[1]/a/span
xpath3: /html/body/div[1]/div[2]/div[3]/div[2]/div/div[2]/div/div[3]/div[1]/div[1]/div[1]/a/span

now let the regex be something like this :

/html/body/div[1]/div[2]/div[3]/div[2]/div/div[2]/div/ * / * / * /div[1]/a/span

is it possible to get all xpaths that satisfy the regex we build through some library in python?

I tried using scrapy selectors to fetch all questions but it is failing while fetching the answers, so i want to go through all questions and then fetch their answers, for this I want question Xpaths

Answer 1

You don't need a tool or regex (as well as absolute XPath expressions). Try to use below XPath to match all questions on page:

//div[@class="ClsInnerDrop"]/a

If you don't know how to write your own selectors, check this cheatsheet

Answer 2

Finally, I found the solution for this, with the combination of lxml and scrapy. used @Andersson answer to find all the text content using the selector and then for each text, iterated over the tree and used tree.getpath() from lxml

The solution is not regex based but solved my use-case, so posting it

import requests
from lxml import html

def get_xpath_for_text(tree, text):
 try:
    for tag in tree.iter():
        if tag.text and tag.text == text:
            return tree.getpath(tag)
    return ' '
 except Exception as e:
    return ' '

 webpage = requests.get(url)
 html_content = html.fromstring(webpage.text)
 tree= html_content.getroottree()
 get_xpath_for_text(tree, text)

How to get all xpaths that are matching given regex?

Question

2 answers

solution1
2 ACCPTED 2018-09-12 10:05:12

solution2
0 2018-09-17 10:11:28

How to get all xpaths that are matching given regex?

Question

2 answers

solution1 2 ACCPTED 2018-09-12 10:05:12

solution2 0 2018-09-17 10:11:28

solution1
2 ACCPTED 2018-09-12 10:05:12

solution2
0 2018-09-17 10:11:28