Webscrape multiple webpages from a website using beautifulsoup, requests in Python

Question

I want to srcrape multiple wb pages on a website. Right now my code can scrape reviews from the 1st page. I would like it to scrape reviews from the related pages. In this example till page 8. This is the link of the website https://www.mouthshut.com/product-reviews/Kotak-811-Mobile-Banking-reviews-925917218

import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv

URL = "https://www.mouthshut.com/product-reviews/Kotak-811-Mobile-Banking-reviews-925917218"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
reviews = []  # a list to store reviews

# Use a CSS selector to extract all the review containers
review_divs = soup.select('div.col-10.review')
for element in review_divs :
    review = {'Review_Title': element .a.text, 'URL': element .a['href'], 'Review': element .find('div', {'class': ['more', 'reviewdata']}).text.strip()}
    reviews.append(review)

df = pd.DataFrame(reviews)
print(df)

I want to store all reviews from 8 pages in one dataframe df. I would appreciate the help. Thank You

Answer 1

Switch to the next page after scraping all reviews from the first page and do the same until you got all reviews. Just make your program click on the "nextpage" arrow at the bottom to proceed.

Answer 2

So this is the first page https://www.mouthshut.com/product-reviews/Kotak-811-Mobile-Banking-reviews-925917218 , right. The rest of the pages have -page-x at the end of the url. So you can just make a for loop in your script, like this.

import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv

URL = ""

for x in range(1, 9):
    if x == 1:
        URL = "https://www.mouthshut.com/product-reviews/Kotak-811-Mobile-Banking-reviews-925917218"
    else:
        URL ="https://www.mouthshut.com/product-reviews/Kotak-811-Mobile-Banking-reviews-925917218-page-{}".format(x)

    r = requests.get(URL)
    soup = BeautifulSoup(r.content, 'html5lib')
    reviews = []  # a list to store reviews

    # Use a CSS selector to extract all the review containers
    review_divs = soup.select('div.col-10.review')
    for element in review_divs :
        review = {'Review_Title': element .a.text, 'URL': element .a['href'], 'Review': element .find('div', {'class': ['more', 'reviewdata']}).text.strip()}
        reviews.append(review)

    df = pd.DataFrame(reviews)
    print(df)

Webscrape multiple webpages from a website using beautifulsoup, requests in Python

Question

2 answers

solution1
0 2020-06-19 14:42:28

solution2
0 2020-06-19 14:53:09

Webscrape multiple webpages from a website using beautifulsoup, requests in Python

Question

2 answers

solution1 0 2020-06-19 14:42:28

solution2 0 2020-06-19 14:53:09

solution1
0 2020-06-19 14:42:28

solution2
0 2020-06-19 14:53:09