简体繁体 English

如何在python中抓取受密码保护的网站？

[英]How to spider a password protected site in python?

原文 2010-07-04 01:39:48 1 2 python/ web-crawler

currently I have a spider written in Java that logs into a supplier website and spiders the website. 目前，我有一个用Java编写的蜘蛛，它登录到供应商网站并对该网站进行蜘蛛化。 (using htmlunit) （使用htmlunit）

It keeps the session (cookie) and even lets me enable/disable javascript etc. 它保留了会话（cookie），甚至允许我启用/禁用javascript等。

I also use htmlparser (java) to help parse the html and extract the relevant information. 我还使用htmlparser（java）来帮助解析html并提取相关信息。

Does python have something similar to do this? python有类似的方法吗？

2 个解决方案

Python has urllib2 to crawl pages, which supports password authentication and cookies. Python具有urllib2来爬网页面，该页面支持密码身份验证和cookie。

There is also a HTMLParser for extracting html, but some people prefer the more feature-full BeatifulSoup . 还有一个用于提取html的HTMLParser ，但有些人更喜欢功能更丰富的BeatifulSoup 。

Scrapy API使用urllib2加上一些不同的解析器和帮助程序例程。

如何使用python访问受密码保护的网站？ - How to access a password protected site using python?

使用Python读取受密码保护的Google网站 - Read password protected Google site with Python

如何在python中读取受密码保护的excel - how to read password protected excel in python

PYTHON 3 - 如何通过网络抓取受密码保护的网站？ - PYTHON 3 - How to web scrape a password protected website?

Python如何加入受密码保护的IRC频道 - Python How to join a password protected IRC channel

如何在Python 3中打开受密码保护的zip文件 - How to open password protected zip file in Python 3

如何用python创建一个受密码保护的zipfile？ - How to create a password protected zipfile with python?

使用Machanize访问受密码保护的站点 - Using Machanize to access password protected site

是否有一种安全的方法可以将 React.js 与 Python Flask 后端一起用于多用户、受密码保护的站点 - Is there a secure way to use React.js with a Python Flask backend for a multi-user, password protected site

在电源查询中使用简短的 Python 脚本从受密码保护的站点（-> Power BI）中抓取数据 - Use short Python script in power query to scrape data from password protected site (-> Power BI)

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python访问受密码保护的网站？ - How to access a password protected site using python? 使用Python读取受密码保护的Google网站 - Read password protected Google site with Python 如何在python中读取受密码保护的excel - how to read password protected excel in python PYTHON 3 - 如何通过网络抓取受密码保护的网站？ - PYTHON 3 - How to web scrape a password protected website? Python如何加入受密码保护的IRC频道 - Python How to join a password protected IRC channel 如何在Python 3中打开受密码保护的zip文件 - How to open password protected zip file in Python 3 如何用python创建一个受密码保护的zipfile？ - How to create a password protected zipfile with python? 使用Machanize访问受密码保护的站点 - Using Machanize to access password protected site 是否有一种安全的方法可以将 React.js 与 Python Flask 后端一起用于多用户、受密码保护的站点 - Is there a secure way to use React.js with a Python Flask backend for a multi-user, password protected site 在电源查询中使用简短的 Python 脚本从受密码保护的站点（-> Power BI）中抓取数据 - Use short Python script in power query to scrape data from password protected site (-> Power BI)

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM