美丽汤中的网址错误

Question

I am trying to obtain data-PID and price from Craigslist using beautifulsoup. 我正在尝试使用beautifulsoup从Craigslist获取数据PID和价格。 I have written a separate code which gives me the file CLallsites.txt. 我编写了一个单独的代码，该文件为我提供了CLallsites.txt文件。 In this code I am trying to grab each of those sites from the txt file and get the PIDs of all entries in the first 10 pages. 在这段代码中，我试图从txt文件中获取每个站点，并获取前10页中所有条目的PID。 My code is: 我的代码是：

  from bs4 import BeautifulSoup       
  from urllib2 import urlopen 
  readfile = open("CLallsites.txt")
  product = "mcy"
  while 1:
    u = ""
    count = 0
    line = readfile.readline()
    commaposition = line.find(',')
    site = line[0:commaposition]
    location = line[commaposition+1:]
    site_filename = location + '.txt'
    f = open(site_filename, "a")
    while (count < 10):
       sitenow = site + "\\" + product + "\\" + str(u)
       html = urlopen(str(sitenow))                      
       soup = BeautifulSoup(html)                
       postings = soup('p',{"class":"row"})
       for post in postings:
            y = post['data-pid']
            print y
       count = count +1
       index = count*100
       u = "index" + str(index) + ".html"
    if not line:
        break
    pass

My CLallsites.txt looks like this: 我的CLallsites.txt看起来像这样：

craiglist site, location (Stackoverflow does not allow posting with cragslist links so I cannot show the text, I could try to attach the text file if that helps.) craiglist站点，位置（Stackoverflow不允许使用cragslist链接发布，因此我无法显示文本，如果有帮助，我可以尝试附加文本文件。）

when I run the code I get the following error: 当我运行代码时，出现以下错误：

Traceback (most recent call last): 追溯（最近一次通话）：

File "reading.py", line 16, in html = urlopen(str(sitenow)) 文件“ reading.py”，第16行，位于html = urlopen（str（sitenow））

File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) urlopen中的文件“ /usr/lib/python2.7/urllib2.py”，行126返回_opener.open（URL，数据，超时）

File "/usr/lib/python2.7/urllib2.py", line 400, in open response = self._open(req, data) 文件“ /usr/lib/python2.7/urllib2.py”，第400行，打开响应= self._open（req，data）

File "/usr/lib/python2.7/urllib2.py", line 418, in _open '_open', req) _open'_open'中的文件“ /usr/lib/python2.7/urllib2.py”，第418行，req）

File "/usr/lib/python2.7/urllib2.py", line 378, in _call_chain result = func(*args) _call_chain中的文件“ /usr/lib/python2.7/urllib2.py”，行378 = func（* args）

File "/usr/lib/python2.7/urllib2.py", line 1207, in http_open return self.do_open(httplib.HTTPConnection, req) http_open返回self.do_open（httplib.HTTPConnection，req）中的文件“ /usr/lib/python2.7/urllib2.py”，行1207

File "/usr/lib/python2.7/urllib2.py", line 1177, in do_open raise URLError(err) do_open中的文件“ /usr/lib/python2.7/urllib2.py”，行1177提高URLError（err）

urllib2.URLError: urllib2.URLError：

Any ideas about what I am doing wrong? 关于我在做什么错的任何想法吗？

Answer 1

I don't know what is the content of sitenow , but it looks like it is an invalid URL. 我不知道sitenow的内容是sitenow ，但看起来它是无效的URL。 Note that URLs use slashes and not backslashes (so the statement sould be something similar to sitenow = site + "/" + product + "/" + str(u) ) 请注意，URL使用斜杠而不是反斜杠（因此，该语句类似于sitenow = site + "/" + product + "/" + str(u) ）

美丽汤中的网址错误

问题描述

1 个解决方案

解决方案1
0 已采纳 2013-03-27 00:04:56

美丽汤中的网址错误

问题描述

1 个解决方案

解决方案1 0 已采纳 2013-03-27 00:04:56

解决方案1
0 已采纳 2013-03-27 00:04:56