爬网Python时出错

Question

when I try to run the code below this error was returned. 当我尝试运行下面的代码时，返回此错误。 I'd be much appreciated if someone can help to point out where I did wrong. 如果有人可以帮助指出我做错了什么，我将不胜感激。 Thank you. 谢谢。

Traceback (most recent call last):
  File "web_crawler.py", line 26, in <module>
    links = get_all_links(page)
  File "web_crawler.py", line 14, in get_all_links
    url, endpos = get_next_target(page)
  File "web_crawler.py", line 2, in get_next_target
    start_link = page.find("<a href=")
TypeError: a bytes-like object is required, not 'str'

def get_next_target(page):
    start_link = page.find("<a href=")
    if start_link == -1:
        return None, 0
    start_quote = page.find('"',start_link)
    end_quote = page.find('"',start_quote+1)
    url = page[start_quote+1:end_quote]
    print(url)
    return url, end_quote

def get_all_links(page):
    links = []
    while True:
        url, endpos = get_next_target(page)
        if url:
            links.append(url)
            page = page[endpos:]
        else:
            break
    return links

import requests
url='https://en.wikipedia.org/wiki/Moon'
r = requests.get(url)
page = r.content
links = get_all_links(page)

Answer 1

response.content is the raw contents of the request. response.content是请求的原始内容。 They are not decoded it or anything, it's just the raw bytes. 他们没有被解码或其他任何东西，只是原始字节。

What you want to use instead is the response.text attribute, which contains the decoded content as a string. 您要使用的是response.text属性，该属性包含已解码的内容作为字符串。

(You also probably want to use an html parsing library like BeautifulSoup instead of your current page.find approach) （您可能还想使用像BeautifulSoup这样的html解析库，而不是当前的page.find方法）

爬网Python时出错

问题描述

1 个解决方案

解决方案1
3 2018-04-13 18:38:36

爬网Python时出错

问题描述

1 个解决方案

解决方案1 3 2018-04-13 18:38:36

解决方案1
3 2018-04-13 18:38:36