簡體 English 中英

使用 python 抓取網站

[英]Scrape websites with python

原文 2020-06-09 16:59:48 4 2 python/ web-scraping/ beautifulsoup/ scrapy/ libraries

我剛剛開始 python。 我正在嘗試 web 抓取一個網站以從中獲取價格和標題。 我瀏覽了多個教程和博客，最常見的庫是美湯和scrapy 。 My question is that is there any way to scrape a website without using any library? 如果有辦法在不使用任何第三方庫（如beautifulsoup和scrapy ）的情況下抓取網站。 It can use builtin libraries請給我推薦一篇博客、文章或教程，以便我學習

2 個解決方案

您可以使用urllib代替使用scrapy 。

您可以使用regex而不是beautifulsoup 。

但是scrapy和beautifulsoup讓您的生活更輕松。

Scrapy ，不容易的庫，所以你可以使用requests或urllib 。

i think the best, popular and easy to learn and use libraries in python web scraping are requests, lxml and BeautifulSoup which has the latest version is bs4 in summary 'Requests' lets us make HTML requests to the website's server for retrieving the data on its頁。 獲取 web 頁面的 HTML 內容是 web 抓取的第一步。

我們來看看Requests Python庫的優缺點

優點：

簡單的
基本/摘要認證
國際域名和 URL
分塊請求
HTTP(S) 代理支持

缺點：

僅檢索頁面的 static 內容
不能用於解析 HTML
無法處理純粹使用 JavaScript 制作的網站

我們知道請求庫無法解析從 web 頁面檢索到的 HTML。 因此，我們需要 lxml，一個高性能、超快、生產質量的 HTML 和 XML 解析 Python 庫。

下面我們來看看lxml Python庫的優缺點。

優點：

比那里的大多數解析器更快
輕的
使用元素樹
Pythonic API

缺點：

不適用於設計不良的 HTML
官方文檔對初學者不太友好

BeautifulSoup 可能是最廣泛使用的 Python 庫，用於 web 抓取。 它創建一個解析樹來解析 HTML 和 XML 文檔。 Beautiful Soup 自動將傳入的文檔轉換為 Unicode，將傳出的文檔自動轉換為 UTF-8。

Beautiful Soup 庫的一個主要優點是它可以很好地與設計不佳的 HTML 配合使用，並且具有很多功能。 Beautiful Soup 和 Requests 的結合在業界相當普遍。

優點：

需要幾行代碼
很棒的文檔
易於初學者學習
強大的
自動編碼檢測

缺點：

比 lxml 慢

如果您想學習如何使用 Beautiful Soup 抓取 web 頁面，本教程適合您：

順便說一句，您可以嘗試很多庫，例如 Scrapy、Selenium 庫，用於 Web 抓取、正則表達式和 urllib

如何使用 Python 登錄和抓取網站？

[英]How to Login and Scrape Websites with Python?

無法使用 python 抓取網站

[英]Unable to scrape websites using python

如何用 Python 和漂亮的湯來抓取網站

[英]How to scrape websites with Python and beautiful soup

試過Python BeautifulSoup和Phantom JS：STILL無法抓取網站

[英]Tried Python BeautifulSoup and Phantom JS: STILL can't scrape websites

是否可以自動從網站上抓取文章-Python和精美湯

[英]Is it possible to automatically scrape articles from websites - Python & Beautiful Soup

使用 Selenium 在 Python 中抓取 Java 重型網站的更新

[英]Update on Using Selenium To Scrape Java Heavy Websites in Python

如何使用 Python 抓取嵌入在網站中的表格 web

[英]How to web scrape tables embedded in websites using Python

使用BeautifulSoup抓取網站

[英]scrape websites using BeautifulSoup

用無限滾動抓取網站

[英]scrape websites with infinite scrolling

使用scrapy刮網站

[英]Scrape websites using scrapy

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何使用 Python 登錄和抓取網站？無法使用 python 抓取網站如何用 Python 和漂亮的湯來抓取網站試過Python BeautifulSoup和Phantom JS：STILL無法抓取網站是否可以自動從網站上抓取文章-Python和精美湯使用 Selenium 在 Python 中抓取 Java 重型網站的更新如何使用 Python 抓取嵌入在網站中的表格 web 使用BeautifulSoup抓取網站用無限滾動抓取網站使用scrapy刮網站

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM