简体   繁体   中英

Error while scraping website using BeautifulSoup

I'm trying to scrape some songs from genius. I created the following method:

import requests
from bs4 import BeautifulSoup

    def get_song_lyrics(link):
    
        response = requests.get(link)
        soup = BeautifulSoup(response.text, "html.parser")
        lyrics = soup.find("div",attrs={'class':'lyrics'}).find("p").get_text()
        return [i for i in lyrics.splitlines()] 

I don't understand why this

get_song_lyrics('https://genius.com/Kanye-west-black-skinhead-lyrics')

returns:

AttributeError: 'NoneType' object has no attribute 'find'

while this:

get_song_lyrics('https://genius.com/Kanye-west-hold-my-liquor-lyrics')

returns correctly the lyric of the song. Both pages have the same layout. Can someone help me figuring that out?

The page is returning two versions of HTML. You can use this script to take care of both of them:

import requests
from bs4 import BeautifulSoup


url = 'https://genius.com/Kanye-west-black-skinhead-lyrics'
soup = BeautifulSoup(requests.get(url).content, 'lxml')

for tag in soup.select('div[class^="Lyrics__Container"], .song_body-lyrics p'):

    for i in tag.select('i'):
        i.unwrap()
    tag.smooth()

    t = tag.get_text(strip=True, separator='\n')
    if t:
        print(t)

Prints:

[Produced By Daft Punk & Kanye West]
[Verse 1]
For my theme song (Black)
My leather black jeans on (Black)
My by-any-means on

...and so on.

I'm not sure what is causing it, but it looks like sometimes BeautifulSoup is successful and sometimes not, and not due to your code. One workaround would be running the function again if the code isn't successful:

import requests
from bs4 import BeautifulSoup

def get_song_lyrics(link):
    
    response = requests.get(link)
    soup = BeautifulSoup(response.text, "html.parser")
    try:
        lyrics = soup.find("div",attrs={'class':'lyrics'}).find("p").get_text()
        return [i for i in lyrics.splitlines()] 
    except AttributeError:
        return get_song_lyrics(link)
    
get_song_lyrics('https://genius.com/Kanye-west-black-skinhead-lyrics')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM