简体   繁体   English

PYTHON 3 - 如何通过网络抓取受密码保护的网站?

[英]PYTHON 3 - How to web scrape a password protected website?

I'm trying to access a website in my work, however it's username/password protected.我正在尝试访问我工作中的网站,但它受用户名/密码保护。 The user/pw pop-up also looks as in the picture. user/pw 弹出窗口也如图所示。 Login image I attach my code to view the website.登录图片我附上我的代码以查看网站。 I can see the HTML code, however with an error "401 Authorization Required".我可以看到 HTML 代码,但是出现错误“需要 401 授权”。 Can you please help?你能帮忙吗?

import requests
from bs4 import BeautifulSoup as bs

r = requests.get("http://10.75.19.101/mfgindex", auth=('root', 'password'))

# Convert to beautiful soup object

soup = bs(r.content, features="html.parser")

# print
print(soup.prettify())

Generally if site is password-protected you can't obviously bypass the login procedure.通常,如果站点受密码保护,您显然无法绕过登录程序。 That forces you to leverage a RPA process where your code controls the web browser and performs login action leveraging real login and pwd, followed by automated browsing of the pages you need and extraction of the elements you require from HTML using the BeautifulSoup.这迫使您利用 RPA 流程,其中您的代码控制 Web 浏览器并利用真实登录名和密码执行登录操作,然后自动浏览您需要的页面并使用 BeautifulSoup 从 HTML 中提取您需要的元素。

For that I suggest to try out Selenium ( https://www.selenium.dev/ )为此,我建议尝试使用 Selenium ( https://www.selenium.dev/ )

A short tutorial is here:一个简短的教程在这里:

https://medium.com/ymedialabs-innovation/web-scraping-using-beautiful-soup-and-selenium-for-dynamic-page-2f8ad15efe25 https://medium.com/ymedialabs-innovation/web-scraping-using-beautiful-soup-and-selenium-for-dynamic-page-2f8ad15efe25

I tried it to solve similar task some time ago and it worked good前段时间我试过用它来解决类似的任务,效果很好

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM