[英]Python list object has no attribute error
I am new to Python and I am trying to write a website scraper to get links from subreddits, which I can then pass to another class later on for automatic download of images from imagur. 我是Python的新手,我正尝试编写一个网站刮板以获取来自subreddit的链接,然后可以将其传递给另一个类,以便稍后从imagur自动下载图像。
In this code snippet, I am just trying to read the subreddit and scrape any imagur htmls from hrefs, but I get the following error: 在此代码段中,我只是尝试阅读subreddit并从hrefs中抓取任何imagur html,但出现以下错误:
AttributeError: 'list' object has no attribute 'timeout'
Any idea as to why this might be happening? 是否知道为什么会发生这种情况? Here is the code:
这是代码:
from bs4 import BeautifulSoup
from urllib2 import urlopen
import sys
from urlparse import urljoin
def get_category_links(base_url):
url = base_url
html = urlopen(url)
soup = BeautifulSoup(html)
posts = soup('a',{'class':'title may-blank loggedin outbound'})
#get the links with the class "title may-blank "
#which is how reddit defines posts
for post in posts:
print post.contents[0]
#print the post's title
if post['href'][:4] =='http':
print post['href']
else:
print urljoin(url,post['href'])
#print the url.
#if the url is a relative url,
#print the absolute url.
get_category_links(sys.argv)
Look at how you call the function: 看一下如何调用该函数:
get_category_links(sys.argv)
sys.argv
here is a list of script arguments where the first item is the script name itself. sys.argv
是脚本参数的列表,其中第一项是脚本名称本身。 This means that your base_url
argument value is a list which leads to failing urlopen
: 这意味着您的
base_url
参数值是一个导致urlopen
失败的列表:
>>> from urllib2 import urlopen
>>> urlopen(["I am", "a list"])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
│ │ │ └ <object object at 0x105e2c120>
│ │ └ None
│ └ ['I am', 'a list']
└ <urllib2.OpenerDirector instance at 0x105edc638>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 422, in open
req.timeout = timeout
│ └ <object object at 0x105e2c120>
└ ['I am', 'a list']
AttributeError: 'list' object has no attribute 'timeout'
You meant to get the second argument from sys.argv
and pass it to get_category_links
: 您打算从
sys.argv
获取第二个参数,并将其传递给get_category_links
:
get_category_links(sys.argv[1])
It's interesting though, how cryptic and difficult to understand the error in this case is. 但是,有趣的是,这种情况下的错误是多么的神秘和难以理解。 This is coming from the way the "url opener" works in Python 2.7 .
这来自“ URL opener”在Python 2.7中的工作方式 。 If, the
url
value (the first argument) is not a string, it assumes it is a Request
instance and tries to set a timeout
value on it: 如果
url
值(第一个参数)不是字符串,则假定它是一个Request
实例,并尝试为其设置timeout
值:
def open(self, fullurl, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT):
# accept a URL or a Request object
if isinstance(fullurl, basestring):
req = Request(fullurl, data)
else:
req = fullurl
if data is not None:
req.add_data(data)
req.timeout = timeout # <-- FAILS HERE
Note that the behavior have not actually changed in the latest stable 3.6 as well . 请注意,该行为在最新的稳定版3.6中也没有实际更改 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.