Here is the error:
File "f**.py", line 34, in module
url_type = url.split('-')[0][-2:] #
Here is the whole block:
fit_urls = []
for event_url in event_urls:
print event_url
try:
sock = urllib.urlopen(event_url)
event_html = sock.read()
event_soup = BeautifulSoup(event_html)
tds = event_soup.find_all('td')
for td in tds:
for link in td.find_all('a'):
url = link.get('href')
url_type = url.split('-')[0][-2:] letters
if url_type == 'ht':
#print url
fit_urls.append(url)
except HTTPError:
pass
`
That is because any of your 'link'
is not having the 'href'
attribute. You may verify it by adding print link
before doing url = link.get('href')
.
In order to fix this, you may add a additional if
check to filter such links as:
for td in tds:
for link in td.find_all('a'):
url = link.get('href')
if url: # additional check. will be `False` when `'url'` will be `None`
url_type = url.split('-')[0][-2:] letters
# Your rest of the code
It looks like url = link.get('href')
is returning None
. You can check for None
in your loop:
for td in tds:
for link in td.find_all('a'):
url = link.get('href')
if not url:
continue
url_type = url.split('-')[0][-2:] letters
if url_type == 'ht':
#print url
fit_urls.append(url)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.