[英]Find all elements with specific text selenium python
I am trying to connect with elements that carry the contact numbers on each site. 我正在尝试与每个站点上带有联系电话的元素进行连接。 I was able to create the routine to get the numbers, extract the contact number with available formats and regex and the following code snippet to get the element
我能够创建获取号码的例程,提取可用格式和正则表达式的联系电话以及以下代码片段以获取元素
contact_elem = browser.find_elements_by_xpath("//*[contains(text(), '" + phone_num + "')]")
Considering the example of https://www.cssfirm.com/ , the contact number appears in 2 locations, the top header and the bottom footer 考虑到https://www.cssfirm.com/的示例,联系电话出现在两个位置,即顶部和底部的页脚
The element texts accompanying the contact number are as follows : 联系电话随附的元素文本如下:
<h3>CALL US TODAY AT (855) 910-7824</h3> - Footer
<a href="tel:8559107824"> <span>Call Us<br>Today</span> (855) 910-7824</a> - Header
The extracted phone number matches perfectly while printing it out. 提取的电话号码在打印时完全匹配。 For some reason, the element from the header part is not being detected.
由于某种原因,没有检测到标头部分的元素。
I tried by searching for elements and even by deleting the footer element from the browser before executing the rest of the code. 我尝试通过搜索元素 ,甚至在执行其余代码之前从浏览器中删除 页脚元素 。
What could be the reason for it to go undetected? 使其未被发现的原因可能是什么?
PS: Below is the amateurish,uncorrected code. PS:下面是业余代码,未经更正。 Efficiency edits/suggestions are welcome.
欢迎进行效率编辑/建议。 The same code has been tested with various sites and works fine.
相同的代码已经在各种站点上进行了测试,并且工作正常。
url = 'http://www.cssfirm.com/'
browser.get(url)
parsed = browser.find_element_by_tag_name('html').get_attribute('innerHTML')
s = BeautifulSoup(parsed, 'html.parser')
s = s.decode('utf-8')
phoneNumberRegex = '(\s*(?:\+?(\d{1,4}))?[-. (]*(\d{1,})[-. )]*(\d{3}|[A-Z0-9]+)[-. \/]*(\d{4}|[A-Z0-9]+)[-. \/]?(\d{4}|[A-Z0-9]+)?(?: *x(\d+))?\s*)'
custom_re = ['([0-9]{4,4} )([0-9]{3,3} )([0-9]{4,4})',
'([0-9]{3,3} )([0-9]{4,4} )([0-9]{4,4})',
'(\+[0-9]{2,2}-)([0-9]{4,4}-)([0-9]{4,4}-)(0)',
'(\([0-9]{3,3}\) )([0-9]{3,3}-)([0-9]{4,4})',
'(\+[0-9]{2,2} )(\(0\)[0-9]{4,4} )([0-9]{4,6})',
'([0-9]{5,5} )([0-9]{6,6})',
'(\+[0-9]{2,2}\(0\))([0-9]{4,4} )([0-9]{4,4})',
'(\+[0-9]{2,2} )([0-9]{3,3} )([0-9]{4,4} )([0-9]{3,3})',
'([0-9]{3,3}-)([0-9]{3,3}-)([0-9]{4,4})']
phones = []
phones = re.findall(phoneNumberRegex, s)
phone_num_list = ()
phone_num = ''
matched = 0
for phoneHeader in phones:
#phoneHeader = phoneHeader.decode('utf-8')
for ph_cnd in phoneHeader:
for pttrn in custom_re:
phones = re.findall(pttrn,ph_cnd)
if(phones):
phone_num_list = phones
for x in phone_num_list:
phone_num = ''.join(x)
try:
contact_elem = browser.find_element_by_xpath("//*[contains(text(), '" + phone_num + "')]")
phone_num_txt = contact_elem.text
if(phone_num_txt):
matched = 1
break
except NoSuchElementException:
pass
if(matched == 1):
break
if(matched == 1):
break
if(matched == 1):
break
print("Phone number :",phone_num) <-- Perfect output
contact_elem <--empty for header or just the footer element
EDIT 编辑
Code updated. 代码已更新。 Forgot an important piece.
忘记了重要的一块。 Moreover, there is sleep time given in between to give time for the page to load.
此外,在它们之间有睡眠时间,以给页面加载时间。 Considering it trivial, I haven't included them for a quick read.
考虑到它很琐碎,我没有将它们包括在内以供快速阅读。
I found a temporary solution by searching for the partial link text, as the number also comes on the link. 我通过搜索部分链接文本找到了一个临时解决方案,因为链接上也有数字。
contact_elem2 = browser.find_element_by_partial_link_text(phone_num)
However, this does not answer the generic question as to why that text was ignored within the element. 但是,这不能回答有关为什么在元素内忽略该文本的一般性问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.