简体   繁体   English

URLOpen错误,同时将URL与单词列表中的单词组合在一起

[英]URLOpen Error while combining url with word from wordlist

Hey guys im making a Python Webcrawler at the Moment. 大家好,我当时正在制作Python Webcrawler。 So i have a link, which last chars are: "search?q=" and after that im using my wordlist which i have loaded before into a list. 所以我有一个链接,最后一个字符是:“ search?q =“,之后我使用我的单词表将其加载到列表中。 But when i try to open that with : urllib2.urlopen(url) it throws me an Error (urlopen error no host given) . 但是,当我尝试使用urllib2.urlopen(url)打开该文件时,会抛出一个错误(没有主机的urlopen错误)。 But when i open that link with urllib normally (so typing the word which is normally automatic pasted in) it just works fine. 但是,当我正常打开带有urllib的链接时(因此键入通常自动粘贴的单词),就可以正常工作。 So can you tell me why this is happening? 那你能告诉我为什么会这样吗?

Thanks and regards 谢谢并恭祝安康

Full error: 完整错误:

  File "C:/Users/David/PycharmProjects/GetAppResults/main.py", line 61, in <module> getResults() File "C:/Users/David/PycharmProjects/GetAppResults/main.py", line 40, in getResults usock = urllib2.urlopen(url) File "C:\\Python27\\lib\\urllib2.py", line 127, in urlopen return _opener.open(url, data, timeout) File "C:\\Python27\\lib\\urllib2.py", line 402, in open req = meth(req) File "C:\\Python27\\lib\\urllib2.py", line 1113, in do_request_ raise URLError('no host given') urllib2.URLError: <urlopen error no host given> 

Code: 码:

 with open(filePath, "r") as ins: wordList = [] for line in ins: wordList.append(line) def getResults(): packageID = "" count = 0 word = "Test" for x in wordList: word = x; print word url = 'http://www.example.com/search?q=' + word usock = urllib2.urlopen(url) page_source = usock.read() usock.close() print page_source startSequence = "data-docid=\\"" endSequence = "\\"" while page_source.find(startSequence) != -1: start = page_source.find(startSequence) + len(startSequence) end = page_source.find(endSequence, start) print str(start); print str(end); link = page_source[start:end] print link if link: if not link in packageID: packageID += link + "\\r\\n" print packageID page_source = page_source[end + len(endSequence):] count+=1 

So when i print the string word it outputs the correct word from the wordlist 因此,当我打印字符串单词时,它会从单词列表中输出正确的单词

I solved the Problem. 我解决了问题。 I simply using now the urrlib instead of urllib2 and anything works fine thank you all :) 我现在只是使用urrlib而不是urllib2,一切正常,谢谢大家:)

Note that urlopen() returns a response, not a request. 请注意,urlopen()返回响应,而不是请求。

You may have a broken proxy configuration; 您可能有损坏的代理配置; verify that your proxies are working: 验证您的代理是否正常工作:

print(urllib.request.getproxies())

or bypass proxy support altogether with: 或完全绕过代理支持:

url = urllib.request.urlopen(
    "http://www.example.com/search?q="+text_to_check
    proxies={})

Sample way to combining URL with word from Wordlist. 将URL与Wordlist中的单词组合的示例方法。 It combines the list words to get the images from the url and downloads it. 它结合了列表单词以从url获取图像并下载。 Loop it around to access the whole list you have. 循环访问您拥有的整个列表。

import urllib
import re
print "The URL crawler starts.."

mylist =["http://www.ebay","https://www.npmjs.org/"]
wordlist = [".com","asss"]

x = 1
urlcontent = urllib.urlopen(mylist[0]+wordlist[0]).read()
imgUrls = re.findall('img .*?src="(.*?)"',urlcontent)

for imgUrl in imgUrls:
    img = imgUrl
    print img
    urllib.urlretrieve(img,str(x)+".jpg")
    x= x + 1

Hope this helps, else post your code and error logs. 希望这会有所帮助,否则发布您的代码和错误日志。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM