简体繁体 English

Python web 刮板和输入

[英]Python web scraper and input

原文 2023-01-25 07:43:51 3 1 python/ web-scraping

I had started to build a program for my personal work use of web scraping and input for mortgage rates.我已经开始为我的个人工作使用 web 抓取和输入抵押贷款利率构建一个程序。 Essentially what i wanted to do was have my program log into each website, enter the mortgage data necessary, and it would return rates and compare each site so that i wouldnt have to manually do this on each site.基本上我想做的是让我的程序登录到每个网站，输入必要的抵押数据，它会返回利率并比较每个网站，这样我就不必在每个网站上手动执行此操作。

The problem i didnt think of is the login portion.我没有想到的问题是登录部分。 i would have to store tokens and a few other items in order for me to navigate from page to page within each website.我将不得不存储令牌和一些其他项目，以便我在每个网站内从一个页面导航到另一个页面。

my question is, is this even possible since i dont know the credentials/tokens to send to each page within a site?我的问题是，这是否可能，因为我不知道要发送到站点内每个页面的凭据/令牌？ (i have the login info but unsure if i need more than just the credentials and the tokens) （我有登录信息，但不确定我是否需要的不仅仅是凭据和令牌）

1 个解决方案

This is complicated with just the request module.这对于请求模块来说很复杂。

Note that this approach requires more system resources请注意，此方法需要更多系统资源

You can use PlayWright to control a chromium instance.您可以使用PlayWright来控制 Chromium 实例。

Chromium saves the credentials and tokens like nearly every other browser and you just have to program the browser to login and scrape. Chromium 像几乎所有其他浏览器一样保存凭据和令牌，您只需对浏览器进行编程即可登录和抓取。