简体   繁体   English

使用 webbot 抓取网页

[英]Web Scraping using webbot

I am trying to create a simple program to login to a webpage using my credentials and grab the total amount left of flex dollars I have in my account for college.我正在尝试创建一个简单的程序来使用我的凭据登录网页并获取我在大学帐户中剩余的弹性美元总额。 Starting at the log in page, I log in, and am redirected to the page of interest, and I simply want to grab that dollar amount and perform some manipulation on it.从登录页面开始,我登录,然后重定向到感兴趣的页面,我只想获取该金额并对其进行一些操作。

I am currently using webbot for the login portion of this, which works, I have just redacted the credentials:我目前正在使用 webbot 作为登录部分,这很有效,我刚刚编辑了凭据:

from webbot import Browser

web = Browser()
web.go_to('insert my url here')
#enter your username and password in the into fields below
web.type('insert email here', into='username')
web.type('insert password here', into='password')
web.click('Login', tag='span')

This works perfectly so far, creating an instance of Chrome and logging into the page I want to grab the dollar amount from.到目前为止,这非常有效,创建了一个 Chrome 实例并登录到我想从中获取美元金额的页面。 I imagine I might want to proceed using urllib, however, I don't think urllib benefits from my current logged in instance of Chrome.我想我可能想继续使用 urllib,但是,我认为 urllib 不会从我当前登录的 Chrome 实例中受益。 How can I work around this and grab a simple html element from the page?我该如何解决这个问题并从页面中获取一个简单的 html 元素?

You first need to get the html source code for the current webpage.您首先需要获取当前网页的 html 源代码。 You can do that using get_page_source() .您可以使用get_page_source()做到这get_page_source() You then need to pass the html source code to beautifulsoup然后你需要将html源代码传递给beautifulsoup

from webbot import Browser
from bs4 import BeautifulSoup
import time

web = Browser()
web.go_to('insert my url here')
#enter your username and password in the into fields below
web.type('insert email here', into='username')
web.type('insert password here', into='password')
web.click('Login', tag='span')
time.sleep(5)

content = web.get_page_source()
soup = BeautifulSoup(content)

#You can now find the element you want
samples = soup.find_all("a", "item-title")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM