简体   繁体   中英

Regex to remove HTML Tags, empty lines and blank spaces in sql query

I have a table that has a column feeback which is a free text from front end. This column has values like -

FEEDBACK
-Agent was listening and very attentive.

Agent showed all the houses and gave the right description

Agent was well versed & knew how to speak multiple
languages





 
-<p>Agent was well dressed for the event</p>

Since this is copy pasted, there are many spaces or empty lines between two lines sometiems that comes in the backend.

I want to remove all these and show the output like -

FEEDBACK
-Agent was listening and very attentive.
Agent showed all the houses and gave the right description
Agent was well versed & knew how to speak multiple
languages
-Agent was well dressed for the event

For this I use the below query -

select REGEXP_REPLACE(regexp_replace(  regexp_replace(
    regexp_replace(
      DBMS_LOB.SUBSTR(max(feedback),4000),
      /*
        Replace LF followed by any non-printable sequence that ends with newline
        with single newline
      */
      chr(10) || '[^[:graph:]]*(' || chr(13) || '?' || chr(10) || ')',
      chr(10) || '\1'
    ),
    /*Then replace newline repetitions*/
    '(' || chr(13) || '?' || chr(10) || ')+',
    '\1'
  ),'<.*?>'),'&nbsp;') as feedback
  from dual;

Is there any way I can merge these regex_replace and not use multiple regex_replace to cater to my requirement?

Not all, but those that replace to nothing can be combined with regex OR |

And better replace those first then.
As removing them could cause extra empty lines.

select
 regexp_replace(
   regexp_replace(DBMS_LOB.SUBSTR(max(feedback),4000)
   , '(&nbsp;)|(<[/[:alpha:]]+>)|([[:space:]]+$)','',1,0,'m') 
   , '(['||chr(13)||']?['||chr(10)||']){2,}','\1') as feedback
from your_table;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM