I am making a simple mp3 down-loader from a Website. In doing so I stuck while parsing time and size of audio:
<div class="mp3-info">
1.69 mins
<br/>
2.33 mb
</div>
Now I need to parse 1.69 mins
and 2.33 mb
from above HTML. I am using python 3.4
I would use BeautifulSoup4 to parse your HTML. See docs here .
import BeautifulSoup
soup = BeautifulSoup.BeautifulSoup(your_html_string)
soup.findAll("div", {"class": "mp3-info"})
# Now extract the text
Also because it's a class, it could be that there are multiple ones on the page...
You can extract text from HTML using lxml library.
Here is related StackOverflow question https://stackoverflow.com/a/4624146/315168
After you get the length and size as text out, then you split them pieces. Eg
text = ... extract element text using lxml
minutes, min_suffix, megabytes, mega_suffix = text.split()
seconds = float(minutes) * 60.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.