I'd like to download some html source with urllib2 or mechanize (with .read()). Unfortunately the source I want to have is quite large. I just get a string of length up to 65747 characters (with both libs). The remaining tail is not considered. This really bugs me, I don't know how to deal with this problem. Can someone give me a hint?
EDIT: Here's a snippet of the code I use.
cj = cookielib.LWPCookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
dataHTML = ""
fp = opener.open(url)
while 1:
r = fp.read()
if r == '':
break
dataHTML += r
You can call read() several of times:
b = ''
while 1:
r = f.read()
if r == '':break
b += r
works better?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.