如何使用Python 3登录网站并进行抓取

Question

I would like to log-in to facebook messenger and parse the HTML. 我想登录Facebook Messenger并解析HTML。

import requests
from bs4 import BeautifulSoup
import webbrowser
page = requests.get("https://www.messenger.com", auth=
('username', 'password'))

soup = BeautifulSoup(page, 'html.parser')

print(soup)

I got this from another stack question but it is throwing me this error: 我是从另一个堆栈问题中得到的，但它抛出了这个错误：

    File "C:/Code/Beautiful Soup Web Scraping.py", line 7, in <module>
    soup = len(BeautifulSoup(page, 'html.parser'))
  File "C:\Users\Ethan\AppData\Local\Programs\Python\Python37\lib\site-packages\bs4\__init__.py", line 246, in __init__
    elif len(markup) <= 256 and (
TypeError: object of type 'Response' has no len()

How can I get this to work? 我该如何工作？

Answer 1

You must pass to BeautifulSoup the content of the web page, not the Response object returned by requests.get . 你必须传递给BeautifulSoup的网页，而不是内容Response通过返回的对象requests.get 。 To get the content use the Response.content property. 要获取内容，请使用Response.content属性。

In your example use : soup = BeautifulSoup(page.content, 'html.parser') 在您的示例中使用： soup = BeautifulSoup(page.content, 'html.parser')

Answer 2

I would recommend using Selenium , which will allow you to login to Facebook, navigate to the desired page, and retrieve the html. 我建议使用Selenium ，它将允许您登录Facebook，导航到所需页面并检索html。 You can then pass the HTML to BeautifulSoup . 然后，您可以将HTML传递给BeautifulSoup 。 Take a look at this blog post to get started. 查看此博客文章以开始使用。

如何使用Python 3登录网站并进行抓取

问题描述

2 个解决方案

解决方案1
0 2018-11-30 20:38:21

解决方案2
0 2018-11-30 21:01:17

如何使用Python 3登录网站并进行抓取

问题描述

2 个解决方案

解决方案1 0 2018-11-30 20:38:21

解决方案2 0 2018-11-30 21:01:17

解决方案1
0 2018-11-30 20:38:21

解决方案2
0 2018-11-30 21:01:17