Extracing song length and size from HTML using Python

Question

I am making a simple mp3 down-loader from a Website. In doing so I stuck while parsing time and size of audio:

<div class="mp3-info">
    1.69 mins
<br/>
    2.33 mb
</div>

Now I need to parse 1.69 mins and 2.33 mb from above HTML. I am using python 3.4

Answer 1

I would use BeautifulSoup4 to parse your HTML. See docs here .

import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup(your_html_string)
soup.findAll("div", {"class": "mp3-info"})
# Now extract the text

Also because it's a class, it could be that there are multiple ones on the page...

Answer 2

You can extract text from HTML using lxml library.

After you get the length and size as text out, then you split them pieces. Eg

 text = ... extract element text using lxml
 minutes, min_suffix, megabytes, mega_suffix = text.split()

 seconds = float(minutes) * 60.0