简体   繁体   English

用beautifulsoup │python 3.8 从天才歌词中获取歌曲歌词

[英]Getting lyrics of song from genius lyrics with beautifulsoup │python 3.8

I am trying to get the lyrics for a song from genius lyrics using beautifulsoup but when trying to print out the lyrics, I get no output.我正在尝试使用 beautifulsoup 从天才歌词中获取歌曲的歌词,但是在尝试打印歌词时,我没有得到 output。 Here is my code:这是我的代码:

import requests 
from bs4 import BeautifulSoup
songURL = requests.get("https://genius.com/Marshmello-and-bastille-happier-lyrics")
song = songURL.content
soup = BeautifulSoup(song, 'lxml')
lyrics = soup.find_all("section")
for lyr in lyrics:
    for lyr1 in lyrics.select("p"):
        print(lyr1.text)      

Why is this not working, can somebody please look into this as I have been trying to do this for a while now.为什么这不起作用,有人可以调查一下,因为我一直在尝试这样做一段时间。

It seems that the server returns two version of the page: in one version there are tags with class="song_body-lyrics" , in other version with class="Lyrics__Container..." .似乎服务器返回了页面的两个版本:在一个版本中,标签带有class="song_body-lyrics" ,在另一个版本中带有class="Lyrics__Container..."

This script tries to handle both cases:此脚本尝试处理这两种情况:

import requests 
from bs4 import BeautifulSoup

url = 'https://genius.com/Marshmello-and-bastille-happier-lyrics'
soup = BeautifulSoup(requests.get(url).content, 'lxml')

for tag in soup.select('div[class^="Lyrics__Container"], .song_body-lyrics p'):
    t = tag.get_text(strip=True, separator='\n')
    if t:
        print(t)

Prints:印刷:

[Intro]
Lately, I've been, I've been thinking
I want you to be happier, I want you to be happier
[Verse 1]

...and so on.
import requests 
from bs4 import BeautifulSoup
songURL = requests.get("https://genius.com/Marshmello-and-bastille-happier-lyrics")
song = songURL.content
soup = BeautifulSoup(song, 'lxml')
final_lyrics = []
lyrics = soup.find('div', {'class': "lyrics"})
lyrics = lyrics.find_all('p')
for i in lyrics:
    final_lyrics.append(i.text)
    print(i)

you should get all texts those are in a specific div.你应该得到那些在特定 div 中的所有文本。 you can find that specific div with devtools or viewsource in your browser.您可以在浏览器中使用devtoolsviewsource找到特定的 div。 here that specific div is <div class='lyrics'> the unique feature of this div is its class, that is class 'lyrics' so we should find this specific div in HTML and then print all texts in that div.这里特定的 div 是<div class='lyrics'>这个 div 的独特之处在于它的 class,即 class 'lyrics' 所以我们应该在 Z4C4AD5FCA2E7A3F74DBB1CED003 中找到这个特定的 div,然后打印 AA.div1

import bs4 as bs
import urllib.request

source = urllib.request.urlopen('https://alirezaarabi.com/view-source_https___genius.com_Alessia-cara-ready-lyrics.html').read()

soup = bs.BeautifulSoup(source,'lxml')
print(soup.title.string)

for div in soup.find_all('div', class_='lyrics'):
    print(div.text)

If you take a look at the actual HTML source code, there are no section tags.如果您查看实际的 HTML 源代码,则没有section标记。 Here's what the structure actually looks like:以下是该结构的实际外观:

<div class="song_body column_layout" initial-content-for="song_body">
  <div class="column_layout-column_span column_layout-column_span--primary">
    <div class="song_body-lyrics">
      
        <h2 class="text_label text_label--gray text_label--x_small_text_size u-top_margin">Happier Lyrics</h2>
      
      <div initial-content-for="lyrics">
        <div class="lyrics">
          
            <!--sse-->
            <p>[Intro]<br>
Lately, I've been, I've been thinking<br>
I want you to be happier, I want you to be happier<br>
<br>
...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM