奇怪的BeautifulSoup soup.findAll错误：在函数内不起作用

Question

我正在尝试构建一个非常简单的刮板，以作为爬虫项目的一部分来收获链接。 我设置了以下功能来进行抓取：

import requests as rq 
from bs4 import BeautifulSoup

def getHomepageLinks(page):
    homepageLinks = []
    response = rq.get(page)
    text = response.text
    soup = BeautifulSoup(text)
    for a in soup.findAll('a'):
        homepageLinks.append(a['href'])
    return homepageLinks

我将此文件另存为“ scraper2.py”。 当我尝试运行代码时，出现以下错误：

>>> import scraper2 as sc
>>> sc.getHomepageLinks('http://washingtonpost.com')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "scraper2.py", line 9, in getHomepageLinks
    for a in soup.findAll('a'):
TypeError: 'NoneType' object is not callable

现在，奇怪的是：如果我尝试调试代码并仅打印响应，则可以正常工作：

>>> response = rq.get('http://washingtonpost.com')
>>> text = response.text
>>> soup = BeautifulSoup(text)
>>> for a in soup.findAll('a'):
...     print(a['href'])
... 
https://www.washingtonpost.com
#
#
http://www.washingtonpost.com/politics/
https://www.washingtonpost.com/opinions/
http://www.washingtonpost.com/sports/
http://www.washingtonpost.com/local/
http://www.washingtonpost.com/national/
http://www.washingtonpost.com/world/
...

如果我正确地读取了错误消息，则问题出在汤.findAll，但是仅当findAll是函数的一部分时。 我确定我拼写正确（不是findall或Findall，因为这里的许多错误都在这里），而且我已经尝试使用上一篇文章中建议的lxml进行修复，但该问题并未解决。 有人有什么想法吗？

Answer 1

尝试将您的for循环替换为以下内容：

for a in soup.findAll('a'):
    url = a.get("href")
    if url != None:
        homepageLinks.append(url)

奇怪的BeautifulSoup soup.findAll错误：在函数内不起作用

问题描述

1 个解决方案

解决方案1
0 2016-01-13 21:48:21

奇怪的BeautifulSoup soup.findAll错误：在函数内不起作用

问题描述

1 个解决方案

解决方案1 0 2016-01-13 21:48:21

解决方案1
0 2016-01-13 21:48:21