简体   繁体   English

使用 Beautiful soup 从网站上抓取特定数据

[英]Scraping particular data from a website using Beautiful soup

I am trying to scrape only the guitar chords of a song from this url https://tabs.ultimate-guitar.com/tab/jason-mraz/im-yours-chords-373896 and simply print it.我试图从这 url https://tabs.ultimate-guitar.com/tab/jason-mraz/im-yours-chords-373896中只刮出一首歌的吉他和弦并简单地打印出来。

But I don't get any output while printing it out.但是在打印出来时我没有得到任何 output 。 What am i doing wrong here?我在这里做错了什么? Below is my code.下面是我的代码。

        import requests
        from bs4 import BeautifulSoup
    
        url = 'https://tabs.ultimate-guitar.com/tab/jason-mraz/im-yours-chords-373896'
        headers = {
        'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) 
         Chrome/87.0.4280.88 Safari/537.36'
    }
        r = requests.get(url, headers=headers)
       soup = BeautifulSoup(r.content, 'html.parser')
       results = soup.find("div", {"class": "_4cjjy"})
       links = results.findAll("header", {"class": "_2jxI1"})
       for item in links:
           print("Chords: ", item)

You need to install selenium and chromedriver您需要安装seleniumchromedriver

Use selenium to get the html, then do the rest as normal with bs4使用 selenium 获得 html,然后像平常一样使用 bs4 执行 rest

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options



url = 'https://tabs.ultimate-guitar.com/tab/jason-mraz/im-yours-chords-373896'

BrowserOptions = Options()
BrowserOptions.add_argument("--headless")
Browser = webdriver.Chrome(executable_path=r'chromedriver.exe',options=BrowserOptions)
Browser.get(url)

html_source_code = Browser.execute_script("return document.body.innerHTML;")

soup = BeautifulSoup(html_source_code, 'html.parser')
links = soup.findAll("span",class_= "_3bHP1 _3ffP6")

for item in links:
    print("Chords: ", item.text)

Output: Output:

Chords:  G
Chords:  D
Chords:  Em
Chords:  C
Chords:  G
Chords:  D
Chords:  Em
Chords:  C
Chords:  G
Chords:  D
Chords:  Em
Chords:  C
Chords:  G
Chords:  D
Chords:  Em
Chords:  C
Chords:  G
Chords:  D
Chords:  Em
Chords:  C
Chords:  G
Chords:  D
Chords:  Em
Chords:  C
Chords:  G
Chords:  D
Chords:  Em
Chords:  C
Chords:  A7
Chords:  G
Chords:  D
Chords:  Dsus4
Chords:  Em
Chords:  C
Chords:  G
Chords:  D
Chords:  Em
Chords:  C
Chords:  G
Chords:  G
Chords:  D
Chords:  Em
Chords:  D
Chords:  C
Chords:  A7
Chords:  G
Chords:  Bm
Chords:  Em
Chords:  D
Chords:  C
Chords:  A7
Chords:  G
Chords:  D
Chords:  Em
Chords:  C
Chords:  G
Chords:  D
Chords:  Em
Chords:  C
Chords:  G
Chords:  D
Chords:  Dsus4
Chords:  Em
Chords:  C
Chords:  G
Chords:  D
Chords:  D
Chords:  Em
Chords:  C
Chords:  G
Chords:  D
Chords:  Dsus4
Chords:  Em
Chords:  C
Chords:  A7

Example HTML Code示例 HTML 代码

<span class="_3bHP1 _3ffP6" data-name="G" style="color: rgb(0, 0, 0);">G</span>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM