简体   繁体   English

抓取时提取列表值

[英]Extracting List Values While Scraping

I'm looking through https://www.nps.gov/index.htm and trying to create a dictionary where the state names from the drop-down menu are the keys and the values are the links to the appropriate page containing that state's information.我正在浏览https://www.nps.gov/index.htm并尝试创建一个字典,其中下拉菜单中的州名称是键,值是指向包含该州的相应页面的链接信息。

However, with my current code, I am getting something like this:但是,使用我当前的代码,我得到了这样的信息:

<li><a href="/state/wy/index.htm">Wyoming</a></li>

With my current skill level I don't know how to extract the state name, because it doesn't have any identifier or class or anything right?以我目前的技能水平,我不知道如何提取州名,因为它没有任何标识符或类或任何东西?

So how would I go about achieving this?那么我将如何实现这一目标? Here is my current code:这是我当前的代码:

state_dict = {}

url = 'https://www.nps.gov/index.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
drop_down_search = soup.find('ul', class_="dropdown-menu SearchBar-keywordSearch")
state_search = drop_down_search.find_all('li', recursive=True)

for state in state_search:
    print(state)

You can use .text property, just like this:您可以使用.text属性,就像这样:

import requests
from bs4 import BeautifulSoup

state_dict = {}

url = 'https://www.nps.gov/index.htm'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
drop_down_search = soup.find('ul', class_="dropdown-menu SearchBar-keywordSearch")
state_search = drop_down_search.find_all('li', recursive=True)

for state in state_search:
    print(state.text)

it will print only the text:它只会打印文本:

Alabama
Alaska
American Samoa
Arizona
Arkansas
...
...

for state in state_search:
    for link in state.find_all('a'):
        print("%30s ===> %s" % (link.text, link.get('href')))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM