繁体 English 中英

Web 使用 Python 进行抓取（通用 URL）

[英]Web Scraping using Python (Generic URLs)

原文 2019-11-14 15:47:16 0 2 python/ python-3.x/ web-scraping

我有一组 URL（多个网站），有人可以建议一个 python 库（最好是项目的 github 链接）来执行此操作。 我最初使用“PRAW”从 Reddit 中提取数据，但我想开发一个通用的 python 代码，可用于从 URL 提供的任何网站中提取各种标签。

2 个解决方案

如果要使用简单、轻量级的库，可以使用BeautifulSoup

from bs4 import BeautifulSoup

doc = "<a href='https://google.com'>Google</a>"
soup = BeautifulSoup(doc,'html.parser')
URL = soup.find('a').get('href')

还有其他选项，例如scrapy框架。

Scrapy是最流行、最简单、最有趣的抓取框架。 关注 xpath 而不是 css，因为 xpath 有更多选择，简单而准确。

这个刮引号的教程可以是你的第一次尝试！

以下是一些有用的链接开始使用：

一切顺利！

使用Selenium Python（NSFW）从网页上刮取网址

[英]Scraping URLs from web pages using Selenium Python (NSFW)

使用python从网址列表中抓取网页

[英]Web scraping from the list of urls with python

网页抓取python中打印网址的问题

[英]Problem with prints urls in web scraping python

具有多个 URL 的 Python Web Scraping + 合并数据

[英]Python Web Scraping with Multiple URLs + merge datas

网址列表中的python web抓取

[英]python web scraping from a list of urls

Python Web 抓取多个网址 Output ZCC8D68C551C4A4AFDED7

[英]Python Web scraping multiple URLs Output CSV

使用Python进行Web抓取

[英]Web scraping using Python

使用 Python 进行网页抓取

[英]Web scraping using Python

在使用 python 进行 web 抓取时，由于 URL 已更改，无法在 google 上抓取图像

[英]Cant scrape images on google due to its changed URLs when web scraping using python

使用beautifulsoup（未知网址类型）遍历用于使用python进行网页抓取的网址列表

[英]Iterate through a list of urls for web scraping with python using beautifulsoup (unknown url type)

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用Selenium Python（NSFW）从网页上刮取网址使用python从网址列表中抓取网页网页抓取python中打印网址的问题具有多个 URL 的 Python Web Scraping + 合并数据网址列表中的python web抓取 Python Web 抓取多个网址 Output ZCC8D68C551C4A4AFDED7 使用Python进行Web抓取使用 Python 进行网页抓取在使用 python 进行 web 抓取时，由于 URL 已更改，无法在 google 上抓取图像使用beautifulsoup（未知网址类型）遍历用于使用python进行网页抓取的网址列表

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM