简体   繁体   中英

java regex match string containing words with no digits and optionally separated by comma

Inspired by a previous question, I'm trying to find a regex that matches a string containing at least one word formed by only characters, not digits. So \\w is not applicable. Comma separated words are ok only if there are not two commas in a row.

This is the best I've found is:

(.*\s+,?)*([a-zA-Z]+)+(,?\s+.*)*

but it doesn't match the following strings:

aaaaa,11111
11111,aaaaa
11111,aaaaa,
,aaaaa
aaaaa,
,aaaaa,
aaaaa,11111,,
,,aaaaa,bbbbb
aaaaa,,bbbbb,ccccc
aaaaa,bbbbb,,ccccc
aaaaa,bbbbb,ccccc
aaaaa,11111

Here's a test program to determine if a regex is correct:

import java.util.*;
import java.lang.*;
import java.io.*;

class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
    String regex = "(.*\\s+,?)*([a-zA-Z]+)+(,?\\s+.*)*";
    String shouldMatch[] = new String[] {
        "aaaaa",
        "aaaaa bbbbb",
        "aaaaa 11111",
        "11111 aaaaa",
        "aaaaa,11111",
        "aaaaa, 11111",
        "aaaaa,  11111",
        "11111,aaaaa",
        "11111, aaaaa",
        "11111,  aaaaa",
        "11111,aaaaa,",
        ",aaaaa",
        "aaaaa,",
        ",aaaaa,",
        "aaaaa,11111,,",
        ",,aaaaa,bbbbb",
        "aaaaa1111 bbbbb",
        "aaaaa1111 bbbbb ccccc",
        "aaaaa1111bbbbb ccccc",
        "aaaaa11111bbbbb ccccc 22222",
        ",,aaaaa bbbbb",
        "aaaaa,,bbbbb ccccc",
        "aaaaa,,bbbbb,ccccc",
        "aaaaa,bbbbb,,ccccc",
        "aaaaa,bbbbb,ccccc",
        "aaaaa,11111"
    };

    String shouldNotMatch[] = new String[] {
        "aaaaa11111",
        "11111bbbbb",
        "aaaaa11111bbbbb",
        "aaaaa11111bbbbb 11111ccccc",
        "aaaaa11111bbbbb ccccc11111",
        "aaaaa,,bbbbb",
        "aaaaa,,11111",
        ",,aaaaa",
        "aaaaa,,",
        "11111",
        "11111,22222",
        "11111 22222",
        ""
    };

    boolean result = true;

    for(String stringToTest : shouldMatch){
        if (!(stringToTest.matches(regex))){
            System.out.println(stringToTest + " Don't match. WRONG.");
            result = false;
        }
    }

    for(String stringToTest : shouldNotMatch){
        if (stringToTest.matches(regex)){
            System.out.println(stringToTest + " Match. WRONG.");
            result = false;
        }
    }

    if (result){
        System.out.println("Congratulation, your regex is right.");
    }
    else {
        System.out.println("Result of one ore more test is wrong.");
    }
}
}

Edit: Added some more String that should not match the regex, empty string and numbers only (plus comma or spaces).

This works, I checked with your test program:

String regex = "^.*?(?<=\\s|^|,)(?<!,,)[A-Za-z]+(?!,,)(?=\\s|,|$).*$";

正则表达式可视化

^ "begins with"

.*? non-greedy for any non-newline character

(?<=\\\\s|^|,) Positive look behind for white space or beginning of string or , , since they are the only valid characters that can come before our definition of word

(?<!,,) Negative look behind for ,, , as they are now allowed before word
[A-Za-z]+ 1 or more letters

(?!,,) Negative look ahead for ,, as they are now allowed after word

(?=\\\\s|,|$) Positive look ahead for white space or end of string or , , since they are the only valid characters that can come after our definition of word

$ "ends with"

根据您的示例,以下方法应该起作用:

String regex = "(?i)(?=.*?(?<!,,)\\b[a-z]+\\b(?!,,))[, \\w]+";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM