简体   繁体   中英

Python BeautifulSoup inconsistent result

I have been trying to learn a bit of python, and I tried to create a small program that asks the user for subreddit and then prints all the front page headlines and links to the articles, here is the code

import requests
from bs4 import BeautifulSoup

subreddit = input('Type de subreddit you want to see : ')
link_visit = f'https://www.reddit.com/r/{subreddit}/'
print(link_visit)

base_url = link_visit
r = requests.get(base_url)
soup = BeautifulSoup(r.text, 'html.parser')

for article in soup.find_all('div', class_='top-matter'):

   headline = article.find('p', class_='title')
   print('HeadLine : ' , headline.text )

   a = headline.find('a', href=True)
   link = a['href'].split('/domain')
   print('Link : ' , link[0])

My problem is that sometimes it prints the desired result, other times it does nothing, only asks the user for the subrredit and prints the link to said subreddit.

Can someone explain why is this happening?

Your request is being rejected by reddit in order to conserve their resources.

When you detect the failing case, print out the HTML. I think you'll see something like this:

    <h1>whoa there, pardner!</h1>



<p>we're sorry, but you appear to be a bot and we've seen too many requests
from you lately. we enforce a hard speed limit on requests that appear to come
from bots to prevent abuse.</p>

<p>if you are not a bot but are spoofing one via your browser's user agent
string: please change your user agent string to avoid seeing this message
again.</p>

<p>please wait 3 second(s) and try again.</p>

    <p>as a reminder to developers, we recommend that clients make no
    more than <a href="http://github.com/reddit/reddit/wiki/API">one
    request every two seconds</a> to avoid seeing this message.</p>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM