简体   繁体   English

Java Simple Lexer程序

[英]Java Simple Lexer Program

I created a simple lexer program in Java which prompts the user for a string and displays the lexemes in that String. 我用Java创建了一个简单的词法分析器程序,该程序提示用户输入字符串并在该字符串中显示词素。 However, when I enter a value, if left and/or right parentheses is included in the prompt, after the left or right parentheses a null character is added which it identifies as an identifier by the program. 但是,当我输入一个值时,如果提示中包括左括号和/或右括号,则在左括号或右括号之后会添加一个空字符,该字符将被程序标识为标识符。

Also, if I don't include left and right parentheses in the user prompted String, the last character in the String is not evaluated as a lexeme. 另外,如果我在用户提示的字符串中不包括左括号和右括号,则该字符串中的最后一个字符将不被视为词素。

Here is my code : 这是我的代码:

import javax.swing.JOptionPane;

public class Append
{
  public static void main (String [] args)
  {
    String str = JOptionPane.showInputDialog("Enter string : ");
    char [] arr = str.toCharArray();

    JOptionPane.showMessageDialog(null,arr.length);

    determineLexemes(arr);

   }

  public static void determineLexemes(char [] arr)
  {
    int j = 0;

    String [] arrayString = new String [1000];

    String strTwo = "";

    System.out.println("Symbol Table");

    System.out.println("Lexeme\t\tToken");

    for(int i = 0; i < arr.length; i++)
    {

       if(arr[i] == '+')
            {
                System.out.println("+ \t\t ADD_OP");
            }

       if(arr[i] == '-')
            {
                System.out.println("- \t\t SUB_OP");
            }

       if(arr[i] == '*')
            {
                System.out.println("* \t\t MULT_OP");
            }

       if(arr[i] == '/')
            {
                System.out.println("/ \t\t DIV_OP");
            }

       if(arr[i] == '(')
            {
                System.out.println("( \t\t LEFT_PAREN");
            }

       if(arr[i] == ')')
            {
                System.out.println(") \t\t RIGHT_PAREN");
            }

       if(arr[i] == '=')
            {
                System.out.println("= \t\t EQUAL_OP");
            }

       if(Character.isLetter(arr[i]) || Character.isDigit(arr[i]))
        {
            strTwo += arr[i];
        }

       if(!Character.isLetter(arr[i]) && !Character.isDigit(arr[i]))
        {
            if(!(Character.isWhitespace(arr[i])))
            {
                arrayString[j] = strTwo;
                System.out.println(arrayString[j] + "\t\t" + "IDENTIFIER");
                strTwo = "";
                j++;

            }
        }
    }


 }
}

Any help to resolve the problem is appreciated. 任何帮助解决该问题的帮助表示赞赏。

The problem is that you do not maintain state in your lexer. 问题在于您不维护词法分析器中的状态。 Recognizing a regular language can be done with a finite automaton, which is a simple mechanism that keeps track of its state (and may maintain a buffer for accumulating longer lexemes). 可以使用有限的自动机来识别常规语言,这是一种简单的机制,可以跟踪其状态(并可以保留用于累积较长词素的缓冲区)。

So, initially you should set the state to S0, and each operator and parentheses is recognized, and you stay in state S0. 因此,最初应将状态设置为S0,并识别每个运算符和括号,然后保持状态S0。 For a letter, you enter SI, and remain, while recognizing in SI for more letters and digits. 对于字母,您输入SI并保持不变,同时在SI中识别出更多字母和数字。 An operator terminates SI, and emits the operator and returns to S0. 运算符终止SI,然后释放运算符并返回到S0。 - Recognizing a digit in S0, enters SN, and you handle this in a way similar to SI. -识别S0中的数字,输入SN,然后以类似于SI的方式进行处理。

enum State { S0, IDENTIFIER, NUMBER } 

State state = State.S0;
for(int i = 0; i < arr.length; i++) {
switch( state ){
case S0:
    switch(arr[i]){
    case '+':
        System.out.println("+ \t\t ADD_OP");
        break;
    //...
    default:
        if(Character.isLetter(arr[i])){
            strTwo = ""; strTwo += arr[i];
            state = State.IDENTIFIER;
        }
        if(Character.isDigit(arr[i])){
            strTwo = ""; strTwo += arr[i];
            state = State.NUMBER;
        }
        break;
    }
case IDENTIFIER:
    if(Character.isLetter(arr[i]) || Character.isDigit(arr[i])){
        strTwo += arr[i];
    } else {
        System.out.println(strTwo + "\t\t" + "IDENTIFIER");
        i--;
        state = State.S0;
    }
        break;
case NUMBER:
    if(Character.isDigit(arr[i])){
        strTwo += arr[i];
    } else {
         System.out.println(strTwo + "\t\t" + "NUMBER");
         i--;
         State = State.S0;      
    }
    break;
}

Something is missing here: Handling a number or identifier at the end of the input string. 这里缺少一些内容:在输入字符串的末尾处理数字或标识符。 This can be determined by examining variable state and using the contents of strTwo. 这可以通过检查变量状态并使用strTwo的内容来确定。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM