简体   繁体   中英

Why does BeautifulSoup only scrape from half of my page?

I'm trying to scrape text from a website with BeautifulSoup, and everything works well except for the fact that it's only scraping text from the top part of the page: 在此处输入图像描述

Output:


Toggle navigation









Authors

Author Directory
Amazon’s Top 100
USA Today
New York Times
Author Interview
Author's Top 5 Tips
Authors Pool

Join as an Author


Books

Book Directory
Free Books
#1 Best Seller Books
Signed Paperback
Audio Books
Free with Kindle Unlimited
Limited Time Deals
Book Teaser
Feature Your Book


Quotes

Famous Quotes
Picture Quotes
Quotes By Author
Quotes By Topic
Quote of the Day



Contest

Cover of the Month Voting
Cover of the Month Winners
Cover of the Month Submission


Services

SERVICES
                    
Promotional Plans
Author Program
Feature Your Book

TOOLS
                    

Review GIF Maker

Seasonal Mockup Banner
Promotional GIF Banners


ABOUT
                    
About us
Blog
FAQ
Testimonial
Contact













Search



 SIGNUP
 LOGIN Join as an Author or Reader
Magic Tool

Magic Tool
Seasonal Mockup
Image Editor
Quotes Editor








Authors

Author Directory
Amazon’s Top 100
USA Today
New York Times
Author Interview
Author's Top 5 Tips
Authors Pool

Join as an Author


Books

Book Directory
Free Books
#1 Best Seller Books
Signed Paperback
Audio Books
Free with Kindle Unlimited
Limited Time Deals
Book Teaser
Feature Your Book


Quotes

Famous Quotes
Picture Quotes
Quotes By Author
Quotes By Topic
Quote of the Day



Contest

Cover of the Month Voting
Cover of the Month Winners
Cover of the Month Submission


Services

SERVICES
                    
Promotional Plans
Author Program
Feature Your Book

TOOLS
                    

Review GIF Maker

Seasonal Mockup Banner
Promotional GIF Banners


ABOUT
                    
About us
Blog
FAQ
Testimonial
Contact













Search



 SIGNUP
 LOGIN Join as an Author or Reader
Magic Tool

Magic Tool
Seasonal Mockup
Image Editor
Quotes Editor









Search



Process finished with exit code 0

I don't want text from that part of the page, I only want the right side. I was thinking that one possibility for my error might be the fact that the rest of the text in the website is comprised of links and not regular text, but if that's the case how do I get the text for links? What am I doing wrong, and how can I scrape text from the right side of the website only?

My code:

import bs4 as bs
import urllib.request

source = urllib.request.urlopen('https://allauthor.com/quotes/').read()
soup = bs.BeautifulSoup(source,'lxml')
div = soup.div
for text in div.find_all("div"):
    print(text.text)

Content is served dynamically, data comes from an additional POST resquest. While requests do not render dynamic contents like a browser will do, you won`t get the expected data exactly this way.

You could perform an POST request and scrape the data from the results to get your goal.

Example

Simply adjust the range() to get more, I limited it for demonstration purposes:

import requests
from bs4 import  BeautifulSoup

quotes = []

for i in range(0,1000,100):
    url = f'https://allauthor.com/getQuotesDirectory.php?start={i}&length=100&orderby=usersView desc'
    headers = {'user-agent': 'Mozilla/5.0', 'cookies':''}
    data = requests.post(url, headers=headers).json()['aaData']

    soup = BeautifulSoup(''.join([j for i in data for j in i]))

    for t in soup.select('div.quote-list'):
        quotes.append({
            'quote':t.a.text,
            'category':t.div.a.text,
            'views':t.div.span.text
        })
quotes

Output

[{'quote': 'May God shower his choicest blessings on you. wishing you happiness, good health and a great year ahead.',
  'category': 'Birthday',
  'views': ' 9,210'},
 {'quote': 'A mind all logic is like a knife all blade. It makes the hand bleed that uses it.',
  'category': ' Rabindranath Tagore',
  'views': ' 6,480'},
 {'quote': 'Reality of life When you give importance to people they think that you are always free But They dont understand that you make yourself available for them every time.',
  'category': 'New  Collection',
  'views': ' 6,171'},
 {'quote': 'Xcuse me, I found something under my shoes. Oh its your attitude.',
  'category': 'Attitude',
  'views': ' 6,024'},
 {'quote': 'Truth is I miss you. All the time, every second, every minute, every hour, every day.',
  'category': 'Missing You',
  'views': ' 5,472'},...]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM