简体   繁体   中英

python with beautifulsoup - remove tags

I am doing some python program to extract lyrics

the code i use:

    import urllib
    from bs4 import BeautifulSoup
    url = urllib.urlopen("http://www.lyricsnmusic.com/david-bowie/slip-away-lyrics/22143075")
    soup = BeautifulSoup(url.read())
    print soup.find('pre', itemprop='description')

the result gets me what i need but with the extra of the tag for example : <pre item="description> then the lyrics anyone know how to get only the lyrics? the structure puts the lyrics between the pre tag thanks in advance

Use the text attribute of the node that you've found

import urllib
from BeautifulSoup import BeautifulSoup
url = urllib.urlopen("http://www.lyricsnmusic.com/david-bowie/slip-away-lyrics/2
2143075")
soup = BeautifulSoup(url.read())
desc=soup.find('pre', itemprop='description')
print desc.text

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM