I want to use this regular expression in Python:
<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>
(from RegEx match open tags except XHTML self-contained tags )
def removeHtmlTags(page):
p = re.compile(r'XXXX')
return p.sub('', page)
It seems that I cannot directly substitute the complex regular expression into the above function.
Works fine here. You're probably having trouble because of the quotes. Just triple-quote it:
def removeHtmlTags(page):
p = re.compile(r'''<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>''')
return p.sub('', page)
If you need to remove HTML tags, this should do it:
import re
def removeHtmlTags(page):
pattern = re.compile(r'\<[^>]+\>', re.I)
return pattern.sub('', page)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.