简体   繁体   中英

Remove Comments from HTML Tags

Reffering to How can I strip comment tags from HTML using BeautifulSoup? , I am trying to remove the comments from the below Tag

>>> h
<h4 class="col-sm-4"><!-- react-text: 124 -->52 Week High/Low:<!-- /react-text --><b><!-- react-text: 126 --> ₹ <!-- /react-text --><!-- react-text: 127 -->394.00<!-- /react-text --><!-- react-text: 128 --> / ₹ <!-- /react-text --><!-- react-text: 129 -->252.10<!-- /react-text --></b></h4>

My code -

comments = h.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
print h

But the search for comments results in nothing. I want to extract the 2 values - "52 Week High/Low:" and "₹ 394.00 / ₹ 252.10" from the above Tag.

I also tried removing the tags form the entire html using

soup = BeautifulSoup(html)
comments = soup.findAll(text=lambda text:isinstance(text, Comment))
[comment.extract() for comment in comments]
print soup

But the comments are still there.. Any suggestions?

Are you using Python2.7 and BeautifulSoup4 ? If not the latter, I would install BeautifulSoup4 .

pip install beautifulsoup4

This following script works for me. I just copied and pasted from your question above and ran it.

from bs4 import BeautifulSoup, Comment

html = """<h4 class="col-sm-4"><!-- react-text: 124 -->52 Week High/Low:<!-- /react-text --><b><!-- react-text: 126 --> ₹ <!-- /react-text --><!-- react-text: 127 -->394.00<!-- /react-text --><!-- react-text: 128 --> / ₹ <!-- /react-text --><!-- react-text: 129 -->252.10<!-- /react-text --></b></h4>"""
soup = BeautifulSoup(html)
comments = soup.findAll(text=lambda text:isinstance(text, Comment))

# nit: It isn't good practice to use a list comprehension only for its
# side-effects. (Wastes space constructing an unused list)
for comment in comments:
   comment.extract()

print soup

Note: It's a good thing you posted the print statement. Wouldn't have known it was Python 2 otherwise. Posting the Python version helps too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM