简体   繁体   English

Java解析器和正则表达式

[英]Java parser and regex

I'm writing a parser, and I'm currently working on matching different tokens, and I'm having a bit of trouble with matching. 我正在编写一个解析器,目前正在研究匹配不同的令牌,并且在匹配方面有些麻烦。 I have a test file: 我有一个测试文件:

while a != b
  if a > b
    a := a - b
  if a <= b
    b := b - a
elihw

And part of my code: 而我的代码的一部分:

private static Scanner sc = new Scanner(System.in);
private static Pattern tokenPattern = Pattern.compile("[ ]+");
private static Pattern idPattern = Pattern.compile("[a-zA-Z]+");

....main(...) {
      sc.useDelimiter(tokenPattern);
      statement();
    }

public static void statement() {
    System.out.println("Statement");
    String token = null;
    while (sc.hasNext()) {
        if (sc.hasNext(idPattern)) {
            token = sc.next();
            System.out.print(" (" + token + ") ");
        }
        else {
            token = sc.next();
            System.out.print(token + ' ');
        }
    }
}

When I run this method, it matches the strings before the operators, but not the ones after. 当我运行此方法时,它匹配运算符之前的字符串,但不匹配运算符之后的字符串。 The parenths are there just to mark the ones it matches. 父母在这里只是为了标记它匹配的人。 For example, the line 例如,线

a := a - b

will produce output: 将产生输出:

(a) := (a) - b

I cannot figure out why the b is not matched. 我不知道为什么b不匹配。

Also if anyone could help me with a regex that matches operators that would be great. 另外,如果有人可以通过匹配运算符的正则表达式来帮助我,那将是很棒的。 I have tried many variations of things like this: 我已经尝试过许多类似的方法:

[\+\-\*\\]
[\\+\\-\\*\\\]
[+][-][*][/]

But cannot seem to get it right. 但似乎无法做到正确。

The scanner is likely including a non-printable (newline) character as part of the token. 扫描程序可能包含不可打印的(换行符)字符作为令牌的一部分。

Try this: 尝试这个:

private static Pattern tokenPattern = Pattern.compile("[ \r\n\t]+");

For the operators, try this: 对于运营商,请尝试以下操作:

[<>+-/*=:]+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM