打印相同的HTTPResponse对象将返回不同的输出-Python

Question

def crawl(url):
    html = getHTML(url) # getHTML() retruns HTTPResponse
    print(html.read()) # PRINT STATMENT 1
    if (html == None):
        print("Error getting HTML")
    else:
        # parse html
        bsObj = BeautifulSoup(html, "lxml")
        # print data
        try:
            print(bsObj.h1.get_text())
        except AttributeError as e:
            print(e)

        print(html.read()) # PRINT STAETMENT 2

我不明白的是..

打印声明1打印整个html，而打印声明2仅打印b''

这是怎么回事 ..我对Python很陌生。

Answer 1

html是一个HTTPResponse对象。 HTTPResponse支持类似文件的操作，例如read() 。

就像读取文件时一样， read()消耗可用数据并将文件指针移到文件/数据的末尾。 后续的read()没有任何返回值。

您有两种选择：

使用seek()方法读取后，将文件指针重置为开头：

 print(html.read()) html.seek(0) # moves the file pointer to byte 0 relative to the start of the file/data

而是保存结果：

 html_body = html.read() print(html_body)

通常，您将使用第二个选项，因为重用html_body会更容易

打印相同的HTTPResponse对象将返回不同的输出-Python

问题描述

1 个解决方案

解决方案1
1 已采纳 2016-06-19 09:33:12

打印相同的HTTPResponse对象将返回不同的输出-Python

问题描述

1 个解决方案

解决方案1 1 已采纳 2016-06-19 09:33:12

解决方案1
1 已采纳 2016-06-19 09:33:12