[英]ValueError: unsupported format character 'a' (0x61) at index 79
I am trying to scrape the data from a website using beautiful soup4 and python. 我正在尝试使用漂亮的soup4和python从网站上抓取数据。 Here is my code
这是我的代码
from bs4 import BeautifulSoup
import urllib2
i = 0
for i in xrange(0,38):
page=urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&form_build_id=form-72a297de309517ed5a2c28af7ed15208&form_id=klsfaprep_search_form" %i)
soup = BeautifulSoup(page.read())
for eachuniversity in soup.findAll('div',{'class':'field-item odd'}):
print ''.join(eachuniversity.findAll(text=True)).encode('utf-8')
print ',\n'
i= i+ 1
I think the problem is in the URL that I have given and in the increment statement. 我认为问题出在我给定的URL和递增声明中。 I am able to scrape page by page.
我能够逐页抓取。 But only when I give the xrange.
但是只有当我给xrange时。
ValueError
ValueError
原因 You're mixing {}
formatting with %
formatting. 您正在将
{}
格式与%
格式混合使用。
>>> '{}%20la' % 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: unsupported format character 'a' (0x61) at index 6
>>> '{}%20la'.format(1)
'1%20la'
I recommend you to use {}
formatting, because in URL, there are multiple %
s. 我建议您使用
{}
格式,因为在URL中有多个%
s。
page=urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&form_build_id=form-72a297de309517ed5a2c28af7ed15208&form_id=klsfaprep_search_form".format(i))
You don't need i = 0
and i = i + 1
because for i in xrange(0,38)
take care of it. 您不需要
i = 0
和i = i + 1
因为for i in xrange(0,38)
关照。
import urllib2 # Import standard library module first. (PEP-8)
from bs4 import BeautifulSoup
for i in xrange(0,38):
page = urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&form_build_id=form-72a297de309517ed5a2c28af7ed15208&form_id=klsfaprep_search_form" .format(i))
soup = BeautifulSoup(page.read())
for eachuniversity in soup.findAll('div',{'class':'field-item odd'}):
print ''.join(eachuniversity.findAll(text=True)).encode('utf-8')
print ',\n'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.