I want to srcrape multiple wb pages on a website. Right now my code can scrape reviews from the 1st page. I would like it to scrape reviews from the related pages. In this example till page 8. This is the link of the website https://www.mouthshut.com/product-reviews/Kotak-811-Mobile-Banking-reviews-925917218
import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
URL = "https://www.mouthshut.com/product-reviews/Kotak-811-Mobile-Banking-reviews-925917218"
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
reviews = [] # a list to store reviews
# Use a CSS selector to extract all the review containers
review_divs = soup.select('div.col-10.review')
for element in review_divs :
review = {'Review_Title': element .a.text, 'URL': element .a['href'], 'Review': element .find('div', {'class': ['more', 'reviewdata']}).text.strip()}
reviews.append(review)
df = pd.DataFrame(reviews)
print(df)
I want to store all reviews from 8 pages in one dataframe df. I would appreciate the help. Thank You
Switch to the next page after scraping all reviews from the first page and do the same until you got all reviews. Just make your program click on the "nextpage" arrow at the bottom to proceed.
So this is the first page https://www.mouthshut.com/product-reviews/Kotak-811-Mobile-Banking-reviews-925917218 , right. The rest of the pages have -page-x
at the end of the url. So you can just make a for loop in your script, like this.
import requests
from bs4 import BeautifulSoup
import pandas as pd
import csv
URL = ""
for x in range(1, 9):
if x == 1:
URL = "https://www.mouthshut.com/product-reviews/Kotak-811-Mobile-Banking-reviews-925917218"
else:
URL ="https://www.mouthshut.com/product-reviews/Kotak-811-Mobile-Banking-reviews-925917218-page-{}".format(x)
r = requests.get(URL)
soup = BeautifulSoup(r.content, 'html5lib')
reviews = [] # a list to store reviews
# Use a CSS selector to extract all the review containers
review_divs = soup.select('div.col-10.review')
for element in review_divs :
review = {'Review_Title': element .a.text, 'URL': element .a['href'], 'Review': element .find('div', {'class': ['more', 'reviewdata']}).text.strip()}
reviews.append(review)
df = pd.DataFrame(reviews)
print(df)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.