简体   繁体   中英

Cannot Retrieve contents of a page using BeautifulSoup

I am learning BeautifulSoup and trying to load the contents of this webpage. I am trying to grab contents by going deeper into the HTML tags by inspect element .

I used different code snippets to display and check whether I'm able to retrieve the contents successfully or not.

The following code snippets yielded the results well :

from bs4 import BeautifulSoup
import requests

root = 'https://www.quora.com/topic/Graduate-Record-Examination-GRE-1'
r = requests.get(root)

soup = BeautifulSoup(r.text,'html.parser')

#**The following worked yielded some results :**

#1
a = soup.find_all('div',{'class':'feed'})
print(a)

#2
b = soup.find_all('div',{'class':'ContentWrapper'})
print(b)

#3
c = soup.find_all('div',{'class':'ContentWrapper'})
print(c)

#4
d = soup.find_all('div',{'class':'feed'})
print(d)

#5
e = soup.find_all('div',{'class':'TopicFeed'})
print(e)

But, after getting that much deep, the following didn't yield anything :

f = soup.find_all('div',{'class':'paged_list_wrapper'})
print(f)

It prints: []

Content/HTML code inside <div class='paged_list_wrapper'> is not getting printed. Why ?

The site may be configured to send different pages based on the User-Agent. I ran into the same problem as you did. It returned an empty list. Adding a generic user agent to the headers solved it for me.

from bs4 import BeautifulSoup
import requests
root = 'https://www.quora.com/topic/Graduate-Record-Examination-GRE-1'
headers = {'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X x.y; rv:42.0) Gecko/20100101 Firefox/42.' }
r = requests.get(root,headers=headers)
soup = BeautifulSoup(r.text,'html.parser')
f = soup.findAll('div',{'class':'paged_list_wrapper'})
print(f)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM