简体繁体中英

Scrape websites with python

原文 2020-06-09 16:59:48 6 2 python/ web-scraping/ beautifulsoup/ scrapy/ libraries

I have just started python. I am trying to web scrape a website to fetch the price and title from it. I have gone through multiple tutorial and blog, the most common libraries are beautiful soup and scrapy . My question is that is there any way to scrape a website without using any library? If there is a way to scrape a website without using any 3rd party library like beautifulsoup and scrapy . It can use builtin libraries Please suggest me a blog, article or tutorial so that I can learn

2 answers

Instead of using scrapy you can use urllib .

Instead of beautifulsoup you can use regex .

But scrapy and beautifulsoup do your life easier.

Scrapy , not easy library so you can use requests or urllib .

i think the best, popular and easy to learn and use libraries in python web scraping are requests, lxml and BeautifulSoup which has the latest version is bs4 in summary 'Requests' lets us make HTML requests to the website's server for retrieving the data on its page. Getting the HTML content of a web page is the first and foremost step of web scraping.

Let's take a look at the advantages and disadvantages of the Requests Python library

Advantages:

Simple
Basic/Digest Authentication
International Domains and URLs
Chunked Requests
HTTP(S) Proxy Support

Disadvantages:

Retrieves only static content of a page
Can't be used for parsing HTML
Can't handle websites made purely with JavaScript

We know the requests library cannot parse the HTML retrieved from a web page. Therefore, we require lxml, a high performance, blazingly fast, production-quality HTML, and XML parsing Python library.

Let's take a look at the advantages and disadvantages of the lxml Python library.

Advantages:

Faster than most of the parser out there
Light-weight
Uses element trees
Pythonic API

Disadvantages:

Does not work well with poorly designed HTML
The official documentation is not very beginner-friendly

BeautifulSoup is perhaps the most widely used Python library for web scraping. It creates a parse tree for parsing HTML and XML documents. Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8.

One major advantage of the Beautiful Soup library is that it works very well with poorly designed HTML and has a lot of functions. The combination of Beautiful Soup and Requests is quite common in the industry.

Advantages:

Requires a few lines of code
Great documentation
Easy to learn for beginners
Robust
Automatic encoding detection

Disadvantages:

Slower than lxml

If you want to learn how to scrape web pages using Beautiful Soup, this tutorial is for you:

turtorial

by the way there so many libraries you can try like Scrapy, Selenium Library for Web Scraping, regex and urllib

How to Login and Scrape Websites with Python?

Unable to scrape websites using python

How to scrape websites with Python and beautiful soup

Tried Python BeautifulSoup and Phantom JS: STILL can't scrape websites

Is it possible to automatically scrape articles from websites - Python & Beautiful Soup

Update on Using Selenium To Scrape Java Heavy Websites in Python

How to web scrape tables embedded in websites using Python

scrape websites using BeautifulSoup

scrape websites with infinite scrolling

Scrape websites using scrapy

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to Login and Scrape Websites with Python? Unable to scrape websites using python How to scrape websites with Python and beautiful soup Tried Python BeautifulSoup and Phantom JS: STILL can't scrape websites Is it possible to automatically scrape articles from websites - Python & Beautiful Soup Update on Using Selenium To Scrape Java Heavy Websites in Python How to web scrape tables embedded in websites using Python scrape websites using BeautifulSoup scrape websites with infinite scrolling Scrape websites using scrapy

Related Tags

Scrape websites with python

Question

2 answers

solution1
2 ACCPTED 2020-06-09 17:04:05

solution2
0 2020-06-26 21:33:36

Scrape websites with python

Question

2 answers

solution1 2 ACCPTED 2020-06-09 17:04:05

solution2 0 2020-06-26 21:33:36

solution1
2 ACCPTED 2020-06-09 17:04:05

solution2
0 2020-06-26 21:33:36