简体   繁体   中英

Regular expression to remove quotations

How do I write a regular expression to satisfy these requirements ? I can only use a string.replaceAll function ..

a) For ” which appears at end of paragraph which has a “ , but not “ “ “ “ —remove ”

b) For “ which appears at beginning of paragraph remove “ [NOTE: If there is “ “ “ “ , it should now be “ ]

c) For ” which appears at end of paragraph without a matching “ at beginning of paragraph –remove ”

EDIT:

Rule a)
Transform:
String input1 ="“remove quotes”" 
String output1 ="“remove quotes"

Don't change anything:
String input1 ="““remove quotes”" 
String output1 ="““remove quotes”"

Rule b)
Transform:
String input1 ="“remove quotes”" 
String output1 ="remove quotes”"

Replace with single ldquo:
String input1 ="““remove quotes”" 
String output1 ="“remove quotes”"

Rule c)
Do nothing (there is a matching ldquo):
String input1 ="“do not remove quotes”" 
String output1 ="“do not remove quotes”"

Transform(no matching ldquo hence remove rdquo):
String input1 ="remove quotes”" 
String output1 ="remove quotes"

I think I am going to run all the 3 rules separately on the string. What would be 3 regexes and replace expressions ? 

Description

This regex will do the following:

  1. if 2 initial “ strings and a ending ” , then remove single “
  2. if 1 initial “ string and a ending ” , then remove nothing
  3. if 0 initial “ strings and a ending ” , then remove ending ”

regex: ^(?=.*?”)“\\s*(“)|^(?=.*?”)(“.*?”)|^(?!“)(.*?)”

replace with: $1$2$3

在此输入图像描述

Input text

“ DO NOTHING  ”
“ “ REMOVE INITIAL LD  ”
REMOVE RD  ”

Output text respecitivly

“ DO NOTHING  ”
“ REMOVE INITIAL LD ”
REMOVE RD

These expressions where hashed out from a chat session, and written to be executed one at a time in A,B,C order, however because they are seperate, they can be executed in any order the developer would like which would change based on the desired output.

A

  • 1 LD and 1 RD, remove the RD
  • 2 LD and 1 RD, do nothing
  • regex: ^(“(?!\\s*“).*?)”
  • replace with $1

B

  • 1 LD, remove 1 LD
  • 2 LD, remove 1 LD
  • regex: ^“(\\s*(?:“)?)
  • replace with $1

C

  • 1 LD and 1 RD, do nothing
  • 0 LD and 1 RD, remove the RD
  • regex: ^(?!“)(.*?)”
  • replace with $1

If I understand well, strings like:

“ Criteria 1, ending with RD and beginning with LD, but not LDLD, remove RD ”
“ “ Criteria 1, ending with RD but beginning with LDLD, do nothing to RD ”
“ “ Criteria 2, beginning with LDLD, make it begin with LD ”
Criteria 3 with non-matching RD, remove RD ”

To become:

“ Criteria 1, ending with RD and beginning with LD, but not LDLD, remove RD
“ Criteria 1, ending with RD but beginning with LDLD, do nothing to RD ”
“ Criteria 2, beginning with LDLD, make it begin with LD ”
Criteria 3 with non-matching RD, remove RD

You can use the regex:

^(?:(“(?! “).*?)\s*”|(“) “(.*)|((?!“).*?)\s*”)$

And replace with $1$2$3$4 .

See how it works here .

Or if you meant the symbols, you can find another similar one here .

“ Criteria 1, ending with RD and beginning with LD, but not LDLD, remove RD ”
“ “ Criteria 1, ending with RD but beginning with LDLD, do nothing to RD ”
“ “ Criteria 2, beginning with LDLD, make it begin with LD ”
Criteria 3 with non-matching RD, remove RD ”

Become:

“ Criteria 1, ending with RD and beginning with LD, but not LDLD, remove RD
“ Criteria 1, ending with RD but beginning with LDLD, do nothing to RD ”
“ Criteria 2, beginning with LDLD, make it begin with LD ”
Criteria 3 with non-matching RD, remove RD

And if you want the debuggex picture which might make the regex more understandable:

正则表达图像

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM