简体   繁体   English

查找带有特定文本硒python的所有元素

[英]Find all elements with specific text selenium python

I am trying to connect with elements that carry the contact numbers on each site. 我正在尝试与每个站点上带有联系电话的元素进行连接。 I was able to create the routine to get the numbers, extract the contact number with available formats and regex and the following code snippet to get the element 我能够创建获取号码的例程,提取可用格式和正则表达式的联系电话以及以下代码片段以获取元素

    contact_elem = browser.find_elements_by_xpath("//*[contains(text(), '" + phone_num + "')]")

Considering the example of https://www.cssfirm.com/ , the contact number appears in 2 locations, the top header and the bottom footer 考虑到https://www.cssfirm.com/的示例,联系电话出现在两个位置,即顶部底部的页脚

The element texts accompanying the contact number are as follows : 联系电话随附的元素文本如下:

    <h3>CALL US TODAY AT (855) 910-7824</h3> - Footer
    <a href="tel:8559107824"> <span>Call Us<br>Today</span>&nbsp;&nbsp;(855) 910-7824</a> - Header

The extracted phone number matches perfectly while printing it out. 提取的电话号码在打印时完全匹配。 For some reason, the element from the header part is not being detected. 由于某种原因,没有检测到标头部分的元素。

I tried by searching for elements and even by deleting the footer element from the browser before executing the rest of the code. 我尝试通过搜索元素 ,甚至在执行其余代码之前从浏览器中删除 页脚元素

What could be the reason for it to go undetected? 使其未被发现的原因可能是什么?

PS: Below is the amateurish,uncorrected code. PS:下面是业余代码,未经更正。 Efficiency edits/suggestions are welcome. 欢迎进行效率编辑/建议。 The same code has been tested with various sites and works fine. 相同的代码已经在各种站点上进行了测试,并且工作正常。

url = 'http://www.cssfirm.com/'
browser.get(url)

parsed = browser.find_element_by_tag_name('html').get_attribute('innerHTML')
s = BeautifulSoup(parsed, 'html.parser')
s = s.decode('utf-8')
phoneNumberRegex = '(\s*(?:\+?(\d{1,4}))?[-. (]*(\d{1,})[-. )]*(\d{3}|[A-Z0-9]+)[-. \/]*(\d{4}|[A-Z0-9]+)[-. \/]?(\d{4}|[A-Z0-9]+)?(?: *x(\d+))?\s*)'
custom_re = ['([0-9]{4,4} )([0-9]{3,3} )([0-9]{4,4})',
             '([0-9]{3,3} )([0-9]{4,4} )([0-9]{4,4})',
             '(\+[0-9]{2,2}-)([0-9]{4,4}-)([0-9]{4,4}-)(0)',
             '(\([0-9]{3,3}\) )([0-9]{3,3}-)([0-9]{4,4})',
             '(\+[0-9]{2,2} )(\(0\)[0-9]{4,4} )([0-9]{4,6})',
             '([0-9]{5,5} )([0-9]{6,6})',
             '(\+[0-9]{2,2}\(0\))([0-9]{4,4} )([0-9]{4,4})',
             '(\+[0-9]{2,2} )([0-9]{3,3} )([0-9]{4,4} )([0-9]{3,3})',
             '([0-9]{3,3}-)([0-9]{3,3}-)([0-9]{4,4})']

phones = []
phones = re.findall(phoneNumberRegex, s)
phone_num_list = ()
phone_num = ''
matched = 0

for phoneHeader in phones:
    #phoneHeader = phoneHeader.decode('utf-8')
    for ph_cnd in phoneHeader:
        for pttrn in custom_re:
            phones = re.findall(pttrn,ph_cnd)
            if(phones):
                phone_num_list = phones
                for x in phone_num_list:
                    phone_num = ''.join(x)
                try:
                    contact_elem = browser.find_element_by_xpath("//*[contains(text(), '" + phone_num + "')]")
                    phone_num_txt = contact_elem.text
                    if(phone_num_txt):
                        matched = 1
                        break
                except NoSuchElementException:
                    pass
                if(matched == 1):
                    break
        if(matched == 1):
            break
    if(matched == 1):
        break

print("Phone number :",phone_num) <-- Perfect output
contact_elem <--empty for header or just the footer element

EDIT 编辑

Code updated. 代码已更新。 Forgot an important piece. 忘记了重要的一块。 Moreover, there is sleep time given in between to give time for the page to load. 此外,在它们之间有睡眠时间,以给页面加载时间。 Considering it trivial, I haven't included them for a quick read. 考虑到它很琐碎,我没有将它们包括在内以供快速阅读。

I found a temporary solution by searching for the partial link text, as the number also comes on the link. 我通过搜索部分链接文本找到了一个临时解决方案,因为链接上也有数字。

    contact_elem2 = browser.find_element_by_partial_link_text(phone_num)

However, this does not answer the generic question as to why that text was ignored within the element. 但是,这不能回答有关为什么在元素内忽略该文本的一般性问题。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 selenium 在网页上查找具有特定文本的所有输入元素 - How to find all input elements with specific text on webpage with selenium 使用硒python仅在html中的特定文本之后找到元素 - Find the elements only after a specific text in html using selenium python Selenium和Python找到元素和文本? - Selenium and Python to find elements and text? 在 Selenium Python 中查找一个 class 名字的所有元素 - Find all elements of a class name in Selenium Python 在下拉列表中查找并列出所有元素 selenium python - find and list all elements in dropdown selenium python selenium (python) 引发 StaleElementReferenceException 并且不会继续下载所有 webdriver.find_elements_by_partial_link_text() - selenium (python) raises StaleElementReferenceException and does not continue to download all webdriver.find_elements_by_partial_link_text() Selenium Python Find Elements by XPath 不返回所有预期元素 - Selenium Python Find Elements by XPath does not return all expected elements python/Selenium --&gt; find_elements_by_xpath 方法找不到所有元素 - python/Selenium --> find_elements_by_xpath method not finding all elements Selenium:获取webelement中特定文本之后的所有元素 - Selenium: get all elements after specific text in webelement Python/Selenium 通过 xpath 查找具有特定标签、类、firstchild 的元素 - Python/Selenium find elements with specific tags, classes, firstchild by xpath
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM