简体   繁体   中英

Matching the last two html 'class' attributes encounter before a variable in Python Regex

Regex is definitely giving me a headache. Every time I am moving one step ahead, I have a feeling that I stepping twice back. I am trying to extract the class attribute of the last tag before the one containing any first name.

I randomly found that website which I thought would be a good example to practice. I am trying to write a general rule. Nothing specifically applied to that website, The only assumption is that I know what the first name is and that it is contained in a tag (div, span, h1. ...) with a certain class: Here is my regex trials:

re.findall(r'(?:class="(.+)".+){2}.*' + val, source) #'source' is the source code of the page
re.findall(r'(?:class="(.+)".*class=)+' + val, source) #'val' a name that I know is in the page

Any explanations on what is wrong or on what to do to succeed in my task would be highly appreciated. Thanks a lot and stay safe.

Here is the solution that I found. Assuming any keyword, you want to retrieve the text of the preceding element.

First find the class of the tag containing your keyword:

elt = driver.find_element_by_xpath("//*[contains(text(),'{}')]".format(keyword))
keyword_class = elt.get_attribute('class')

Next you can find the parent or precedingsiblings using xpath.


# Find the class of firstnames preceding siblings and access their text
xpath = "//*[@class='{}']//preceding-sibling::*".format(class_name)
pre_siblings = driver.find_elements_by_xpath(xpath)
for sibling in pre_siblings:
   print(sibling.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM