I have text like below
I m just trying to extract the content from p
tag
I don't want to eliminate the <p>
or any other tag between them
d = "<p><p>{'Area': 'Square',</p>\n<p> <tr> <td>'Flag': 'com'}</p></p>"
My code is below
import re
re.sub('<[^<>]+>', '',d)
My output is
"{'Area': 'Square',\n\xa0\xa0'Flag': 'com'}"
Expected out is only replace first p
and last p
tag
"<p>{'Area': 'Square',</p>\n<p> <tr> <td>'Flag': 'com'}</p>"
Use
re.sub(r'^<p>(.*)</p>$', r'\1', d, flags=re.S)
See regex proof .
EXPLANATION
--------------------------------------------------------------------------------
^ the beginning of the string
--------------------------------------------------------------------------------
<p> '<p>'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
</p> '</p>'
--------------------------------------------------------------------------------
$ before an optional \n, and the end of the
string
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.