简体   繁体   中英

regex to find substring between special characters

I am running into this problem in Java.

I have data strings that contain entities enclosed between & and ; For eg

&Text.ABC;, &Links.InsertSomething; 

These entities can be anything from the ini file we have.

I need to find these string in the input string and remove them. There can be none, one or more occurrences of these entities in the input string.

I am trying to use regex to pattern match and failing.

Can anyone suggest the regex for this problem?

Thanks!

Here is the regex:

"&[A-Za-z]+(\\.[A-Za-z]+)*;"

It starts by matching the character & , followed by one or more letters (both uppercase and lower case) ( [A-Za-z]+ ). Then it matches a dot followed by one or more letters ( \\\\.[A-Za-z]+ ). There can be any number of this, including zero. Finally, it matches the ; character.

You can use this regex in java like this:

Pattern p = Pattern.compile("&[A-Za-z]+(\\.[A-Za-z]+)*;"); // java.util.regex.Pattern
String subject = "foo &Bar; baz\n";
String result = p.matcher(subject).replaceAll("");

Or just

"foo &Bar; baz\n".replaceAll("&[A-Za-z]+(\\.[A-Za-z]+)*;", "");

If you want to remove whitespaces after the matched tokens, you can use this re:

"&[A-Za-z]+(\\.[A-Za-z]+)*;\\s*" // the "\\s*" matches any number of whitespace

And there is a nice online regular expression tester which uses the java regexp library.

http://www.regexplanet.com/simple/index.html

You can try:

input=input.replaceAll("&[^.]+\\.[^;]+;(,\\s*&[^.]+\\.[^;]+;)*","");

See it

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM