美麗湯中的網址錯誤

Question

我正在嘗試使用beautifulsoup從Craigslist獲取數據PID和價格。 我編寫了一個單獨的代碼，該文件為我提供了CLallsites.txt文件。 在這段代碼中，我試圖從txt文件中獲取每個站點，並獲取前10頁中所有條目的PID。 我的代碼是：

  from bs4 import BeautifulSoup       
  from urllib2 import urlopen 
  readfile = open("CLallsites.txt")
  product = "mcy"
  while 1:
    u = ""
    count = 0
    line = readfile.readline()
    commaposition = line.find(',')
    site = line[0:commaposition]
    location = line[commaposition+1:]
    site_filename = location + '.txt'
    f = open(site_filename, "a")
    while (count < 10):
       sitenow = site + "\\" + product + "\\" + str(u)
       html = urlopen(str(sitenow))                      
       soup = BeautifulSoup(html)                
       postings = soup('p',{"class":"row"})
       for post in postings:
            y = post['data-pid']
            print y
       count = count +1
       index = count*100
       u = "index" + str(index) + ".html"
    if not line:
        break
    pass

我的CLallsites.txt看起來像這樣：

craiglist站點，位置（Stackoverflow不允許使用cragslist鏈接發布，因此我無法顯示文本，如果有幫助，我可以嘗試附加文本文件。）

當我運行代碼時，出現以下錯誤：

追溯（最近一次通話）：

文件“ reading.py”，第16行，位於html = urlopen（str（sitenow））

urlopen中的文件“ /usr/lib/python2.7/urllib2.py”，行126返回_opener.open（URL，數據，超時）

文件“ /usr/lib/python2.7/urllib2.py”，第400行，打開響應= self._open（req，data）

_open'_open'中的文件“ /usr/lib/python2.7/urllib2.py”，第418行，req）

_call_chain中的文件“ /usr/lib/python2.7/urllib2.py”，行378 = func（* args）

http_open返回self.do_open（httplib.HTTPConnection，req）中的文件“ /usr/lib/python2.7/urllib2.py”，行1207

do_open中的文件“ /usr/lib/python2.7/urllib2.py”，行1177提高URLError（err）

urllib2.URLError：

關於我在做什么錯的任何想法嗎？

Answer 1

我不知道sitenow的內容是sitenow ，但看起來它是無效的URL。 請注意，URL使用斜杠而不是反斜杠（因此，該語句類似於sitenow = site + "/" + product + "/" + str(u) ）

美麗湯中的網址錯誤

問題描述

1 個解決方案

解決方案1
0 已采納 2013-03-27 00:04:56

美麗湯中的網址錯誤

問題描述

1 個解決方案

解決方案1 0 已采納 2013-03-27 00:04:56

解決方案1
0 已采納 2013-03-27 00:04:56