Loading html source with urllib2/mechanize in python

Question

I'd like to download some html source with urllib2 or mechanize (with .read()). Unfortunately the source I want to have is quite large. I just get a string of length up to 65747 characters (with both libs). The remaining tail is not considered. This really bugs me, I don't know how to deal with this problem. Can someone give me a hint?

EDIT: Here's a snippet of the code I use.

cj = cookielib.LWPCookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

dataHTML = ""
fp = opener.open(url)

while 1:
    r = fp.read()
    if r == '':
        break
    dataHTML += r

Answer 1

You can call read() several of times:

b = ''
while 1:
    r = f.read()
    if r == '':break
    b += r

works better?

Loading html source with urllib2/mechanize in python

Question

1 answers

solution1
0 ACCPTED 2013-03-21 15:23:47

Loading html source with urllib2/mechanize in python

Question

1 answers

solution1 0 ACCPTED 2013-03-21 15:23:47

solution1
0 ACCPTED 2013-03-21 15:23:47