I am developing a small tool to scrape a webpage. I am using Beautiful Soup . I would like to fetch the class id from the page. The HTML code looks something like this:
<span class='class_id' id='New_line'></span>
How would I obtain class_id
?
This answer refers to an older version of the question where beautifulsoup
has not been mentioned
You can use LXML and iterate over all elements asking them for the value of their "class" attribute. LXML is a library for parsing XML documents.
Like, for example:
from lxml import etree
root = etree.parse(filename).getroot()
for span in root.iterdescendants("span"):
cls = span.attrib.get("class")
Does the following example may help you?
>>> from BeautifulSoup import BeautifulSoup as B
>>> s = B("<span class='class_id' id='New_line'></span>")
>>> s.span.attrs
[(u'class', u'class_id'), (u'id', u'New_line')]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.