簡體 English 中英

Web 使用 Python 進行抓取（通用 URL）

[英]Web Scraping using Python (Generic URLs)

原文 2019-11-14 15:47:16 4 2 python/ python-3.x/ web-scraping

我有一組 URL（多個網站），有人可以建議一個 python 庫（最好是項目的 github 鏈接）來執行此操作。 我最初使用“PRAW”從 Reddit 中提取數據，但我想開發一個通用的 python 代碼，可用於從 URL 提供的任何網站中提取各種標簽。

2 個解決方案

如果要使用簡單、輕量級的庫，可以使用BeautifulSoup

from bs4 import BeautifulSoup

doc = "<a href='https://google.com'>Google</a>"
soup = BeautifulSoup(doc,'html.parser')
URL = soup.find('a').get('href')

還有其他選項，例如scrapy框架。

Scrapy是最流行、最簡單、最有趣的抓取框架。 關注 xpath 而不是 css，因為 xpath 有更多選擇，簡單而准確。

這個刮引號的教程可以是你的第一次嘗試！

以下是一些有用的鏈接開始使用：

一切順利！

使用Selenium Python（NSFW）從網頁上刮取網址

[英]Scraping URLs from web pages using Selenium Python (NSFW)

使用python從網址列表中抓取網頁

[英]Web scraping from the list of urls with python

網頁抓取python中打印網址的問題

[英]Problem with prints urls in web scraping python

具有多個 URL 的 Python Web Scraping + 合並數據

[英]Python Web Scraping with Multiple URLs + merge datas

網址列表中的python web抓取

[英]python web scraping from a list of urls

Python Web 抓取多個網址 Output ZCC8D68C551C4A4AFDED7

[英]Python Web scraping multiple URLs Output CSV

使用Python進行Web抓取

[英]Web scraping using Python

使用 Python 進行網頁抓取

[英]Web scraping using Python

在使用 python 進行 web 抓取時，由於 URL 已更改，無法在 google 上抓取圖像

[英]Cant scrape images on google due to its changed URLs when web scraping using python

使用beautifulsoup（未知網址類型）遍歷用於使用python進行網頁抓取的網址列表

[英]Iterate through a list of urls for web scraping with python using beautifulsoup (unknown url type)

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 使用Selenium Python（NSFW）從網頁上刮取網址使用python從網址列表中抓取網頁網頁抓取python中打印網址的問題具有多個 URL 的 Python Web Scraping + 合並數據網址列表中的python web抓取 Python Web 抓取多個網址 Output ZCC8D68C551C4A4AFDED7 使用Python進行Web抓取使用 Python 進行網頁抓取在使用 python 進行 web 抓取時，由於 URL 已更改，無法在 google 上抓取圖像使用beautifulsoup（未知網址類型）遍歷用於使用python進行網頁抓取的網址列表

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM