简体   繁体   中英

Regular Expression: How to Combin Lookahead and Lookbehind

I have a string of comma-delimited characters that I'm splitting. Some of these characters, though, could be commas. For example:

test = "a,b,c,d,,,e,f,g"

I know that (?<!,), is the regular expression for "any commas not preceded by a comma", and ,(?!,) is the regular expression for "any comma followed by a comma". Could someone point me in the right direction and show me how to combine these two. The desired output is:

a  
b  
c  
d  
,  
e  
f  
g  

The program is in Java, so if someone knows a function specific to Java, that works, too.

Similar issue tackled in Regex: replace single characters

Just merging the two regexes that you have, as (?<!,),(?!,) should do the trick, unless there are subtle differences between ruby and Java in this area.

If you want to delete all single , and replace ,,, with , specifically, then you could run search and delete the matched chars from (?<!,)?,(?!,) twice.

You could use (.)(?:,|$) instead of lookahead/lookbehind.

(?:,|$) will match the commas in between or the end of line for the last character, while (.) will capture the character.

Obviously, this will work only if you are matching against a regex expression, not if you are using the expression on the String's split method; in that case you should do as you suggested, using ^(?<=,),^(?!=,) .

Split by , only if there is no , before or after it .

    String str = "a,b,c,d,,,e,f,g";
    String regex = "(?<!,),|,(?!,)";

    for(String s : str.split(regex)) {            
        System.out.println(s);
    }

Output:

 a b c d , e f g 

the following will find a character followed by a comma and then remove the final comma by only taking the first character:

        String test = "a,b,c,d,,,e,f,g";
        Pattern p=Pattern.compile(".,|.$");
        Matcher m=p.matcher(test);
        while(m.find() ){                    
            System.out.println(m.group().charAt(0));
        }

Assuming that for data "a,b,c,d,,,,,e,f,g" split should look like abcd , , efg you could find pair of comas and place some special mark between them. This way you will know that coma with that special mark after it need to be removed, but coma with special mark before it need to stay. Code based on that idea can look like

String data = "a,b,c,d,,,,,e,f,g";
data = data.replace(",,", ",XspecialSplitX,");

String[] tokens = data.split(",XspecialSplitX|(?<!XspecialSplitX),");
for (String s : tokens)
    System.out.print(s+" ");

Output: abcd , , efg


Faster and easier way without regex.
If your string contains only single characters separated with comas then all wanted characters will have even indexes and separating comas odd. In this case all you need to do is iterate over all even indexes like this

char[] data="a,b,c,d,,,e,f,g".toCharArray();
for(int i=0; i<data.length; i+=2)
    System.out.println(data[i]);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM