简体   繁体   English

为 Java.Util.Scanner 格式化正则表达式

[英]Formatting Regex For Java.Util.Scanner

I am wondering how to format this expression to work in Java: [^#]+[#] (1 or more characters that are not a # followed by a #)我想知道如何格式化此表达式以在 Java 中工作: [^#]+[#] (1 个或多个不是 # 后跟 # 的字符)

Using regexr.com (my favorite regex tool) this expression will get the following matches from this input text:使用 regexr.com(我最喜欢的正则表达式工具)这个表达式将从这个输入文本中得到以下匹配项:

input:输入:

aBc def AbC def dfe ABC
#
123
#

matches:火柴:

aBc def AbC def dfe ABC
#
123
#

However when using Scanner.next("[^#]+[#]") I get the InputMismatchException which I take it that it didn't find any matches?但是,当使用Scanner.next("[^#]+[#]")时,我得到了InputMismatchException ,我认为它没有找到任何匹配项? Do I need to escape characters?我需要转义字符吗? In C# I usually avoid this problem with the string literal @ .在 C# 中,我通常使用字符串文字@来避免这个问题。

What am I missing about java Scanner and regex?关于 java 扫描仪和正则表达式,我缺少什么? Thanks.谢谢。

My solution was to use Pattern and Matcher classes instead of the scanner.我的解决方案是使用 Pattern 和 Matcher 类而不是扫描器。 The scanner class didn't behave as expected with Stdin or strings and failed to get matches based on regex (using the hasNext(Regex) and next(Pattern) methods).扫描器 class 的行为与 Stdin 或字符串不符合预期,并且无法基于正则表达式(使用hasNext(Regex)next(Pattern)方法)获得匹配。 If I read more and discover why I will post here.如果我阅读更多并发现为什么我会在这里发帖。

The following successfully pulls each word (in this case a sequence consecutive alphabetical letters) from a string:以下成功地从字符串中提取每个单词(在本例中是一个连续的字母序列):

Pattern wordPattern = Pattern.compile("\\p{Alpha}+");
        Matcher wordFinder = wordPattern.matcher(lines.toString());
        while (wordFinder.find()){
            currentWord=wordFinder.group().toLowerCase();
            AddWord(currentWord);
        }

The posix "\\p{Alpha}+" could also be replaced with [a-zA-Z]+ posix "\\p{Alpha}+"也可以替换为[a-zA-Z]+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM