简体   繁体   中英

Loading html source with urllib2/mechanize in python

I'd like to download some html source with urllib2 or mechanize (with .read()). Unfortunately the source I want to have is quite large. I just get a string of length up to 65747 characters (with both libs). The remaining tail is not considered. This really bugs me, I don't know how to deal with this problem. Can someone give me a hint?

EDIT: Here's a snippet of the code I use.

cj = cookielib.LWPCookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

dataHTML = ""
fp = opener.open(url)

while 1:
    r = fp.read()
    if r == '':
        break
    dataHTML += r

You can call read() several of times:

b = ''
while 1:
    r = f.read()
    if r == '':break
    b += r

works better?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM