使用 webbot 抓取网页

Question

我正在尝试创建一个简单的程序来使用我的凭据登录网页并获取我在大学帐户中剩余的弹性美元总额。 从登录页面开始，我登录，然后重定向到感兴趣的页面，我只想获取该金额并对其进行一些操作。

我目前正在使用 webbot 作为登录部分，这很有效，我刚刚编辑了凭据：

from webbot import Browser

web = Browser()
web.go_to('insert my url here')
#enter your username and password in the into fields below
web.type('insert email here', into='username')
web.type('insert password here', into='password')
web.click('Login', tag='span')

到目前为止，这非常有效，创建了一个 Chrome 实例并登录到我想从中获取美元金额的页面。 我想我可能想继续使用 urllib，但是，我认为 urllib 不会从我当前登录的 Chrome 实例中受益。 我该如何解决这个问题并从页面中获取一个简单的 html 元素？

Answer 1

您首先需要获取当前网页的 html 源代码。 您可以使用get_page_source()做到这get_page_source() 。 然后你需要将html源代码传递给beautifulsoup

from webbot import Browser
from bs4 import BeautifulSoup
import time

web = Browser()
web.go_to('insert my url here')
#enter your username and password in the into fields below
web.type('insert email here', into='username')
web.type('insert password here', into='password')
web.click('Login', tag='span')
time.sleep(5)

content = web.get_page_source()
soup = BeautifulSoup(content)

#You can now find the element you want
samples = soup.find_all("a", "item-title")

使用 webbot 抓取网页

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-12-29 02:00:24

使用 webbot 抓取网页

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-12-29 02:00:24

解决方案1
1 已采纳 2019-12-29 02:00:24