简体   繁体   English

ValueError:索引79处不支持的格式字符'a'(0x61)

[英]ValueError: unsupported format character 'a' (0x61) at index 79

I am trying to scrape the data from a website using beautiful soup4 and python. 我正在尝试使用漂亮的soup4和python从网站上抓取数据。 Here is my code 这是我的代码

from bs4 import BeautifulSoup
import urllib2
i = 0
for i in xrange(0,38):
    page=urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&form_build_id=form-72a297de309517ed5a2c28af7ed15208&form_id=klsfaprep_search_form" %i) 
    soup = BeautifulSoup(page.read())
    for eachuniversity in soup.findAll('div',{'class':'field-item odd'}):
        print ''.join(eachuniversity.findAll(text=True)).encode('utf-8')
    print ',\n'
i= i+ 1

I think the problem is in the URL that I have given and in the increment statement. 我认为问题出在我给定的URL和递增声明中。 I am able to scrape page by page. 我能够逐页抓取。 But only when I give the xrange. 但是只有当我给xrange时。

Reason of the ValueError ValueError原因

You're mixing {} formatting with % formatting. 您正在将{}格式与%格式混合使用。

>>> '{}%20la' % 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: unsupported format character 'a' (0x61) at index 6
>>> '{}%20la'.format(1)
'1%20la'

I recommend you to use {} formatting, because in URL, there are multiple % s. 我建议您使用{}格式,因为在URL中有多个% s。

page=urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&form_build_id=form-72a297de309517ed5a2c28af7ed15208&form_id=klsfaprep_search_form".format(i))

Complete code 完整的代码

You don't need i = 0 and i = i + 1 because for i in xrange(0,38) take care of it. 您不需要i = 0i = i + 1因为for i in xrange(0,38)关照。

import urllib2 # Import standard library module first. (PEP-8)

from bs4 import BeautifulSoup

for i in xrange(0,38):
    page = urllib2.urlopen("http://www.sfap.org/klsfaprep_search?page={}&type=1&strname=&loc=&op=Lancer%20la%20recherche&form_build_id=form-72a297de309517ed5a2c28af7ed15208&form_id=klsfaprep_search_form" .format(i))
    soup = BeautifulSoup(page.read())
    for eachuniversity in soup.findAll('div',{'class':'field-item odd'}):
        print ''.join(eachuniversity.findAll(text=True)).encode('utf-8')
    print ',\n'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 索引79处不支持的格式字符&#39;O&#39;(0x4f) - unsupported format character 'O' (0x4f) at index 79 Python,Django:ValueError:索引3处不受支持的格式字符&#39;(&#39;(0x28) - Python, Django: ValueError: unsupported format character '(' (0x28) at index 3 Python:ValueError:索引1处不支持的格式字符&#39;&#39;&#39;(0x27) - Python: ValueError: unsupported format character ''' (0x27) at index 1 ValueError:索引 650 处不支持的格式字符 &#39;w&#39; (0x77) - ValueError: unsupported format character 'w' (0x77) at index 650 ValueError:索引798处不支持的格式字符&#39;P&#39;(0x50) - ValueError: unsupported format character 'P' (0x50) at index 798 ValueError: 索引 21 处不支持的格式字符 ')' (0x29) - ValueError: unsupported format character ')' (0x29) at index 21 ValueError:不支持的格式字符“!” (0x21) 在索引 2235 - ValueError: unsupported format character '!' (0x21) at index 2235 ValueError:索引处不支持的格式字符&#39;{&#39;(0x7b) - ValueError: unsupported format character '{' (0x7b) at index ValueError:索引3处不支持的格式字符&#39;&lt;&#39;(0x3c) - ValueError: unsupported format character '<' (0x3c) at index 3 ValueError: 不支持的格式字符 &#39;p&#39; (0x70) 在索引 7 - ValueError: unsupported format character 'p' (0x70) at index 7
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM