简体   繁体   中英

Extracing song length and size from HTML using Python

I am making a simple mp3 down-loader from a Website. In doing so I stuck while parsing time and size of audio:

<div class="mp3-info">
    1.69 mins
<br/>
    2.33 mb
</div>

Now I need to parse 1.69 mins and 2.33 mb from above HTML. I am using python 3.4

I would use BeautifulSoup4 to parse your HTML. See docs here .

import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup(your_html_string)
soup.findAll("div", {"class": "mp3-info"})
# Now extract the text

Also because it's a class, it could be that there are multiple ones on the page...

You can extract text from HTML using lxml library.

Here is related StackOverflow question https://stackoverflow.com/a/4624146/315168

After you get the length and size as text out, then you split them pieces. Eg

 text = ... extract element text using lxml
 minutes, min_suffix, megabytes, mega_suffix = text.split()

 seconds = float(minutes) * 60.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM