[英]Extracting List Values While Scraping
I'm looking through https://www.nps.gov/index.htm and trying to create a dictionary where the state names from the drop-down menu are the keys and the values are the links to the appropriate page containing that state's information.我正在浏览https://www.nps.gov/index.htm并尝试创建一个字典,其中下拉菜单中的州名称是键,值是指向包含该州的相应页面的链接信息。
However, with my current code, I am getting something like this:但是,使用我当前的代码,我得到了这样的信息:
<li><a href="/state/wy/index.htm">Wyoming</a></li>
With my current skill level I don't know how to extract the state name, because it doesn't have any identifier or class or anything right?以我目前的技能水平,我不知道如何提取州名,因为它没有任何标识符或类或任何东西?
So how would I go about achieving this?那么我将如何实现这一目标? Here is my current code:这是我当前的代码:
state_dict = {}
url = 'https://www.nps.gov/index.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
drop_down_search = soup.find('ul', class_="dropdown-menu SearchBar-keywordSearch")
state_search = drop_down_search.find_all('li', recursive=True)
for state in state_search:
print(state)
You can use .text
property, just like this:您可以使用.text
属性,就像这样:
import requests
from bs4 import BeautifulSoup
state_dict = {}
url = 'https://www.nps.gov/index.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
drop_down_search = soup.find('ul', class_="dropdown-menu SearchBar-keywordSearch")
state_search = drop_down_search.find_all('li', recursive=True)
for state in state_search:
print(state.text)
it will print only the text:它只会打印文本:
Alabama
Alaska
American Samoa
Arizona
Arkansas
...
...
for state in state_search:
for link in state.find_all('a'):
print("%30s ===> %s" % (link.text, link.get('href')))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.