[英]How to scrape drop-down menu options using Selenium Webdriver with Python?
I use Selenium Webdriver to test a web site with a drop-down menu with different options for different users.我使用 Selenium Webdriver 来测试一个带有下拉菜单的网站,该菜单为不同的用户提供不同的选项。 The number of options and their values are always different.
选项的数量和它们的值总是不同的。 When I look at the source, I see the code below.
当我查看源代码时,我看到了下面的代码。 Could you please provide an example of how in Python I can scrape it and make a list of all the available option values?
您能否提供一个示例,说明如何在 Python 中抓取它并列出所有可用的选项值?
<div _ngcontent-pxo-26="" class="col-md-6">
<div _ngcontent-pxo-26="" class="form-group">
<label _ngcontent-pxo-26="" for="Filter_ClientRegion">Region</label>
<select _ngcontent-pxo-26="" class="form-control ng-pristine ng-valid ng-touched" id="Filter_ClientRegion">
<option _ngcontent-pxo-26="" value="">All</option>
<!--template bindings={}--
<option _ngcontent-pxo-26="" value="A">A</option>
<option _ngcontent-pxo-26="" value="B">B</option>
<option _ngcontent-pxo-26="" value="C">C</option>
<option _ngcontent-pxo-26="" value="D">D</option>
<option _ngcontent-pxo-26="" value="E">E</option>
<option _ngcontent-pxo-26="" value="F">F</option>
<option _ngcontent-pxo-26="" value="G">G</option>
</select>
</div>
</div>
To select
a specific option
, you can use something like:要
select
特定option
,您可以使用以下内容:
from selenium import webdriver
driver = webdriver.Firefox()
driver.get("some.site")
el = driver.find_element_by_id('Filter_ClientRegion')
for option in el.find_elements_by_tag_name('option'):
if option.text == 'A': # or B or C...
option.click() # select() for older versions
break
To get the values
of option
, you can use:要获取
option
的values
,您可以使用:
options = []
driver.get("some.site")
el = driver.find_element_by_id('Filter_ClientRegion')
for option in el.find_elements_by_tag_name('option'):
options.append(option.get_attribute("value"))
# print(options)
# A B C ...
Notes:笔记:
1. I cannot fully test the code above because I don't have the complete source code 1.我无法完全测试上面的代码,因为我没有完整的源代码
2. Please note that the options
code is inside a comments block <!--template bindings={}--
and you may not be able to retrieve its value. 2. 请注意,
options
代码位于注释块<!--template bindings={}--
,您可能无法检索其值。
It should be pretty easy.这应该很容易。
array_options = []
element = WebDriverWait(self.driver, timeout=wait_time).until(
EC.visibility_of_element_located("id","Filter_ClientRegion")))
if element.tag_name == 'select':
select = Select(element)
dropdown_options = select.options
for option in dropdown_options:
array_options.append(option.text)
You can do this with BeautifulSoup.你可以用 BeautifulSoup 做到这一点。
Since you mentioned selenium this code begins by using that, in case you need it to get past a login or something else requiring selenium.由于您提到了硒,因此该代码首先使用它,以防您需要它来通过登录或其他需要硒的东西。 If you don't need selenium then you can skip down to the line where
soup
is made using BeautifulSoup
.如果您不需要 selenium,那么您可以跳到使用
BeautifulSoup
制作soup
的那一行。 The preceding code just shows how to use selenium to get source code so that it can be accessed by BeautifulSoup
.前面的代码只是展示了如何使用 selenium 获取源代码,以便它可以被
BeautifulSoup
访问。
First find the select
tag that contains all of the HTML code, including the commented stuff.首先找到包含所有 HTML 代码的
select
标记,包括注释的内容。 Then take each item in this list, convert it to a string and concatenate it into one big string, and prepend <select>
.然后获取此列表中的每个项目,将其转换为字符串并将其连接为一个大字符串,并添加
<select>
。 Turn this big string into soup and findAll
the option
tags within it.打开这个大串入汤
findAll
的option
之内它的标签。 Display whatever stuff you want from each of these tags.从这些标签中的每一个中显示您想要的任何内容。
>>> from selenium import webdriver
>>> driver = webdriver.Chrome()
>>> content = driver.page_source
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(content, 'lxml')
>>> select = soup.find('select', attrs={'id': 'Filter_ClientRegion'})
>>> items = []
>>> for item in select.contents:
... items.append(str(item).strip())
...
>>> items
['', '<option _ngcontent-pxo-26="" value="">All</option>', '', 'template bindings={}--\n <option _ngcontent-pxo-26="" value="A">A</option>\n <option _ngcontent-pxo-26="" value="B">B</option>\n <option _ngcontent-pxo-26="" value="C">C</option>\n <option _ngcontent-pxo-26="" value="D">D</option>\n <option _ngcontent-pxo-26="" value="E">E</option>\n <option _ngcontent-pxo-26="" value="F">F</option>\n <option _ngcontent-pxo-26="" value="G">G</option>\n </select>\n </div>\n</div>']
>>> newContents = '<select>' + ''.join(items).replace('--','')
>>> newSelectSoup = BeautifulSoup(newContents)
>>> options = newSelectSoup.findAll('option')
>>> len(options)
8
>>> for option in options:
... option.attrs['value']
...
''
'A'
'B'
'C'
'D'
'E'
'F'
'G'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.