无法从 BeautifulSoup4 中的字符串解析“href”

Question

我在这里有这个代码片段：

from bs4 import BeautifulSoup

myString = '<a href="/number-stations/german/g06" title="G06">G06</a>'

i = BeautifulSoup(str(myString), 'html.parser')
print(type(i))
print(i)
myText = i.get_text(strip=True)
print(myText)
myURL = i["href"]
print(myURL)

这个想法是从这个字符串中解析href。

但是，我不明白为什么它看不到它。 我的 output：

<class 'bs4.BeautifulSoup'>
<a href="/number-stations/german/g06" title="G06">G06</a>
G06
Traceback (most recent call last):
  File "c:\Users\user\Desktop\aaa\test.py", line 10, in <module>
    myURL = i["href"]
  File "C:\ProgramData\Anaconda3\lib\site-packages\bs4\element.py", line 1401, in __getitem__
    return self.attrs[key]
KeyError: 'href'

为什么 BeautifulSoup 看不到这个字符串的href？

Answer 1

当您尝试使用i["href"]访问href时，您会像访问它一样访问它dict ，但事实并非如此。 您必须首先使用.find()方法找到标签。

from bs4 import BeautifulSoup

myString = '<a href="/number-stations/german/g06" title="G06">G06</a>'

soup = BeautifulSoup(myString, 'html.parser')

print(soup.find('a').attrs)
print('-' * 10)
print(soup.find('a')['href'])

Output：

{'href': '/number-stations/german/g06', 'title': 'G06'}
----------
/number-stations/german/g06

无法从 BeautifulSoup4 中的字符串解析“href”

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-12-13 20:33:10

无法从 BeautifulSoup4 中的字符串解析“href”

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-12-13 20:33:10

解决方案1
2 已采纳 2020-12-13 20:33:10