使用漂亮的汤和 python3 不断收到“TypeError: 'NoneType' 对象不可调用”

Question

我是一个初学者并且在学习一门课程时正在苦苦挣扎，所以这个问题可能真的很简单，但是我正在运行这个（无可否认的凌乱）代码（保存在文件 x.py 下）以从具有以下行格式的网站中提取链接和名称：

<li style="margin-top: 21px;">
  <a href="http://py4e-data.dr-chuck.net/known_by_Prabhjoit.html">Prabhjoit</a>
</li>

所以我设置了这个： import urllib.request, urllib.parse, urllib.error from bs4 import BeautifulSoup import ssl # 忽略 SSL 证书错误 ctx = ssl.create_default_context() ctx.check_hostname = False ctx.verify_mode = ssl.CERT_NONE

url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
for line in soup:
    if not line.startswith('<li'):
        continue
    stuff = line.split('"')
    link = stuff[3]
    thing = stuff[4].split('<')
    name = thing[0].split('>')
    count = count + 1
    if count == 18:
        break
print(name[1])
print(link)

它不断产生错误：

Traceback (most recent call last):
  File "x.py", line 15, in <module>
    if not line.startswith('<li'):
TypeError: 'NoneType' object is not callable

我已经为此苦苦挣扎了几个小时，如果您有任何建议，我将不胜感激。

Answer 1

line不是字符串，并且它没有startswith()方法。 它是一个BeautifulSoup Tag对象，因为 BeautifulSoup 已经将 HTML 源文本解析为一个丰富的对象模型。 不要试图将其视为文本！

该错误是因为如果您访问Tag对象上不知道的任何属性，它会搜索具有该名称的子元素（因此在这里它执行line.find('startswith') ），并且由于没有具有该名称的元素，返回None 。 None.startswith()然后失败并显示您看到的错误。

如果您想找到第 18 个<li>元素，只需向 BeautifulSoup 询问该特定元素：

soup = BeautifulSoup(html, 'html.parser')
li_link_elements = soup.select('li a[href]', limit=18)
if len(li_link_elements) == 18:
    last = li_link_elements[-1]
    print(last.get_text())
    print(last['href'])

这使用CSS 选择器来仅查找父元素是<li>元素且具有href属性的<a>链接元素。 搜索仅限于 18 个这样的标签，并打印最后一个，但前提是我们确实在页面中找到了 18 个。

使用Element.get_text()方法检索元素文本，该方法将包括来自任何嵌套元素（例如<span>或<strong>或其他额外标记）的文本，并且使用标准索引符号访问href属性。

使用漂亮的汤和 python3 不断收到“TypeError: 'NoneType' 对象不可调用”

问题描述

1 个解决方案

解决方案1
1 已采纳 2018-08-27 17:06:48

使用漂亮的汤和 python3 不断收到“TypeError: &#39;NoneType&#39; 对象不可调用”

问题描述

1 个解决方案

解决方案1 1 已采纳 2018-08-27 17:06:48

使用漂亮的汤和 python3 不断收到“TypeError: 'NoneType' 对象不可调用”

解决方案1
1 已采纳 2018-08-27 17:06:48