i am trying to get these data from the website name Flipkart.com but i am facing error i am using BeautifulSoup & selenium. i cant understand why this error is comming & i also tried many solutions available on internet.
is there any solution of i should try any other method for web scraping please help.
website is opening using selenium driver but unable to get data from the website and am not able to understand that why is this happening
here is my code which i am writing ans executing.
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
#driver = webdriver.Chrome('/usr/local/bin/chromedriver')
driver = webdriver.Chrome(executable_path='chromedriver.exe')
products=[] #List to store name of the product
prices=[] #List to store price of the product
ratings=[] #List to store rating of the product
content=driver.get("https://www.flipkart.com/mobiles/pr?sid=tyy%2C4io&p%5B%5D=facets.brand%255B%255D%3DRealme&otracker=nmenu_sub_Electronics_0_Realme")
soup = BeautifulSoup(content, 'lxml')
print(soup)
for a in soup.findAll('div', attrs={'class':'bhgxx2 col-12-12'}):
name=a.find('div', attrs={'class':'_3wU53n'})
price=a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
rating=a.find('div', attrs={'class':'hGSR34'})
products.append(name.text)
prices.append(price.text)
ratings.append(rating.text)
print(rating.text)
df = pd.DataFrame({'Product Name':products,'Price':prices,'Rating':ratings})
print(df)
df.to_csv('products.csv', index=False, encoding='utf-8')
here is my error which i am getting from command.
Traceback (most recent call last):
File "C:\MachineLearning\WebScraping\web.py", line 10, in <module>
soup = BeautifulSoup(content, 'lxml')
File "C:\Users\karti\AppData\Local\Programs\Python\Python37-32\lib\site-packages\bs4\__init__.py", line 267, in __init__
elif len(markup) <= 256 and (
TypeError: object of type 'NoneType' has no len()
After using driver.get(url)
to load the page, you have to use driver.page_source
to get the page source. driver.get(url)
does not return anything.
from selenium import webdriver
driver = webdriver.Chrome(executable_path='/path/to/chromedriver')
driver.get("https://www.flipkart.com/mobiles/pr?sid=tyy%2C4io&p%5B%5D=facets.brand%255B%255D%3DRealme&otracker=nmenu_sub_Electronics_0_Realme")
print(driver.page_source)
One more issue with your code is that the class bhgxx2 col-12-12
is used many times in that page. Some of them do not have a product inside it. This will give you an AttributeError
inside your for loop.
A working version of your code:
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd
driver = webdriver.Chrome(executable_path='/path/to/chromedriver')
products = [] # List to store name of the product
prices = [] # List to store price of the product
ratings = [] # List to store rating of the product
driver.get("https://www.flipkart.com/mobiles/pr?sid=tyy%2C4io&p%5B%5D=facets.brand%255B%255D%3DRealme&otracker=nmenu_sub_Electronics_0_Realme")
soup = BeautifulSoup(driver.page_source, 'lxml')
for a in soup.findAll('div', attrs={'class':'bhgxx2 col-12-12'}):
try:
name = a.find('div', attrs={'class':'_3wU53n'})
price = a.find('div', attrs={'class':'_1vC4OE _2rQ-NK'})
rating = a.find('div', attrs={'class':'hGSR34'})
products.append(name.text)
prices.append(price.text)
ratings.append(rating.text)
except AttributeError:
pass
df = pd.DataFrame({'Product Name': products, 'Price': prices, 'Rating': ratings})
print(df)
df.to_csv('products.csv', index=False, encoding='utf-8')
Output
Price Product Name Rating
0 ₹5,999 Realme C2 (Diamond Black, 16 GB) 4.4
1 ₹5,999 Realme C2 (Diamond Blue, 16 GB) 4.4
2 ₹8,999 Realme 3 (Radiant Blue, 32 GB) 4.5
3 ₹8,999 Realme 3 (Dynamic Black, 32 GB) 4.5
4 ₹9,999 Realme 3 (Dynamic Black, 64 GB) 4.5
5 ₹10,999 Realme 3 (Diamond Red, 64 GB) 4.4
...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.