繁体   English   中英

Python beautifulSoup 抓取下拉菜单

[英]Python beautifulSoup scraping dropdowns

I'm trying to scrape the search result on this link: https://www.inecnigeria.org/elections/polling-units/ which requires that I select a dropdown value and then another shows up which I have to select from before searching . 我能够从第一个下拉选择中获取值,但不能从其他下拉选择中获取值。 这是我目前拥有的:

from bs4 import BeautifulSoup
import requests

base = 'https://www.inecnigeria.org/elections/polling-units/'

base_req = requests.get(base, verify=False)

soup = BeautifulSoup( base_req.text, "html.parser" )

# states
states = soup.find('select', id = "statePoll")

stateItems = states.select('option[value]')

stateValues = [ stateItem.text for stateItem in stateItems ]


# print(stateValues)

lgas = soup.find('select', id = "lgaPoll")

lgaItems = lgas.select('option[value]')

lgaValues = [ lgaItem.text for lgaItem in lgaItems ]


print(lgas)

实际上,您无法通过在该页面上抓取 HTML 来获得这些值。 该页面使用 JavaScript 从另一个页面请求选项并将它们动态插入到页面中。 您将不得不使用您可以抓取的信息自己提出此类请求。 这是一个如何进行下一步的示例,应该向您展示总体思路:

from bs4 import BeautifulSoup
import requests

base = 'https://www.inecnigeria.org/elections/polling-units/'
lga_view = 'https://www.inecnigeria.org/wp-content/themes/independent-national-electoral-commission/custom/views/lgaView.php'
base_req = requests.get(base, verify=False)
soup = BeautifulSoup(base_req.text, "html.parser" )

states = soup.find('select', id = "statePoll")
state_options = states.find_all('option')
states = {opt.text: int(opt['value']) for opt in state_options if 'value' in opt.attrs}

lga = {k: requests.post(lga_view, {'state_id': v}, verify=False).json() for k,v in states.items()}

print(lga)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM