简体   繁体   中英

Regex for commas and periods allowed

I tried searching for an answer to this question and also reading the Regex Wiki but I couldn't find what I'm looking for exactly.

I have a program that validates a document. (It was written by someone else).

If certain lines or characters don't match the regex then an error is generated. I've noted that a few false errors are always generated and I want to correct this. I believe I have narrowed down the problem to this:

Here is an example:

This error is flagged by the program logic:

ERROR: File header immediate origin name is invalid: CITIBANK, N.A. 

Here is the code that causes that error:

if(strLine.substring(63,86).matches("[A-Z,a-z,0-9, ]+")){

                                }else{
                                    JOptionPane.showMessageDialog(null, "ERROR: File header immediate origin name is invalid: "+strLine.substring(63,86));
                                    errorFound=true;
                                    fileHeaderErrorFound=true;
                                    bw.write("ERROR: File header immediate origin name is invalid: "+strLine.substring(63,86));
                                    bw.newLine();

I believe the reason that the error is called at runtime is because the text contains a period and comma.. I am unsure how to allow these in the regex.

I have tried using this

if(strLine.substring(63,86).matches("[A-Z,a-z,0-9,,,. ]+")){

and it seemed to work I just wanted to make sure that is the correct way because it doesn't look right.

You're right in your analysis, the match failed because there was a dot in the text that isn't contained in the character class .

However, you can simplify the regex - no need to repeat the commas, they don't have any special meaning inside a class:

if(strLine.substring(63,86).matches("[A-Za-z0-9,. ]+"))

Are you sure that you'll never have to match non-ASCII letters or any other kind of punctuation, though?

Alphabets and digits : a-zA-Z0-9 can effectively be replaced by \\w denoting 'words'. The period and comma don't need escaping and can be used as is. Hence this regex might come in handy:

"[\w,.]"

Hope this helps. :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM