简体   繁体   中英

How to extract text from 2 tags of html or replace first and last tag

I have text like below

  • I m just trying to extract the content from p tag

  • I don't want to eliminate the <p> or any other tag between them

d = "<p><p>{'Area': 'Square',</p>\n<p> <tr> <td>'Flag': 'com'}</p></p>"

My code is below

import re
re.sub('<[^<>]+>', '',d)

My output is

"{'Area': 'Square',\n\xa0\xa0'Flag': 'com'}"

Expected out is only replace first p and last p tag

"<p>{'Area': 'Square',</p>\n<p> <tr> <td>'Flag': 'com'}</p>"

Use

re.sub(r'^<p>(.*)</p>$', r'\1', d, flags=re.S)

See regex proof .

EXPLANATION

--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  <p>                      '<p>'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  </p>                     '</p>'
--------------------------------------------------------------------------------
  $                        before an optional \n, and the end of the
                           string

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM