简体   繁体   English

正则表达式允许使用逗号和句点

[英]Regex for commas and periods allowed

I tried searching for an answer to this question and also reading the Regex Wiki but I couldn't find what I'm looking for exactly. 我尝试搜索该问题的答案,还阅读了Regex Wiki,但找不到确切的内容。

I have a program that validates a document. 我有一个验证文档的程序。 (It was written by someone else). (它是别人写的)。

If certain lines or characters don't match the regex then an error is generated. 如果某些行或字符与正则表达式不匹配,则会生成错误。 I've noted that a few false errors are always generated and I want to correct this. 我注意到,总是会产生一些错误的错误,我想更正此错误。 I believe I have narrowed down the problem to this: 我相信我已将问题缩小为:

Here is an example: 这是一个例子:

This error is flagged by the program logic: 该错误由程序逻辑标记:

ERROR: File header immediate origin name is invalid: CITIBANK, N.A. 

Here is the code that causes that error: 这是导致该错误的代码:

if(strLine.substring(63,86).matches("[A-Z,a-z,0-9, ]+")){

                                }else{
                                    JOptionPane.showMessageDialog(null, "ERROR: File header immediate origin name is invalid: "+strLine.substring(63,86));
                                    errorFound=true;
                                    fileHeaderErrorFound=true;
                                    bw.write("ERROR: File header immediate origin name is invalid: "+strLine.substring(63,86));
                                    bw.newLine();

I believe the reason that the error is called at runtime is because the text contains a period and comma.. I am unsure how to allow these in the regex. 我相信在运行时调用错误的原因是因为文本包含句点和逗号。.我不确定如何在正则表达式中允许它们。

I have tried using this 我试过使用这个

if(strLine.substring(63,86).matches("[A-Z,a-z,0-9,,,. ]+")){

and it seemed to work I just wanted to make sure that is the correct way because it doesn't look right. 而且似乎可行,我只是想确保这是正确的方法,因为它看起来不正确。

You're right in your analysis, the match failed because there was a dot in the text that isn't contained in the character class . 您的分析正确,匹配失败,因为字符类中没有包含一个点。

However, you can simplify the regex - no need to repeat the commas, they don't have any special meaning inside a class: 但是,您可以简化正则表达式-无需重复逗号,它们在类中没有任何特殊含义:

if(strLine.substring(63,86).matches("[A-Za-z0-9,. ]+"))

Are you sure that you'll never have to match non-ASCII letters or any other kind of punctuation, though? 您确定您将永远不必匹配非ASCII字母或任何其他类型的标点符号吗?

Alphabets and digits : a-zA-Z0-9 can effectively be replaced by \\w denoting 'words'. 字母和数字:a-zA-Z0-9可以有效地由表示单词的\\ w代替。 The period and comma don't need escaping and can be used as is. 句号和逗号不需要转义,可以原样使用。 Hence this regex might come in handy: 因此,此正则表达式可能派上用场:

"[\w,.]"

Hope this helps. 希望这可以帮助。 :) :)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM