简体   繁体   中英

How do newlines affect System.in.read() in java

I'm trying to make a lexical analyzer class, that mostly tokenizes the input stream characters, and I use System.in.read() to read characters. The doc says that it returns -1 when end of stream is reached, but, how is this behaviour different when it has different input, I cannot understand this. For eg delete.txt has the input:

1. I have
2. bulldoz//er

Then the Lexer has correct tokenization as:

[I=257, have=257, false=259, er=257, bulldoz=257, true=258]  

but now if I insert some blank lines using enter then, the code goes on an infinite loop, the code checks newlines and spaces for input, yet, how does it get bypassed? :

1. I have
2. bulldoz//er
3.    

The full code is:

package lexer;

import java.io.*;
import java.util.*;
import lexer.Token;
import lexer.Num;
import lexer.Tag;
import lexer.Word;

class Lexer{
    public int line = 1;
    private  char null_init = ' ';

    private  char tab = '\t';
    private char newline = '\n';
    private char peek = null_init;
    private char comment1 = '/';
    private char comment2 = '*';
    private Hashtable<String, Word> words = new Hashtable<>();

    //no-args constructor
    public Lexer(){
        reserve(new Word(Tag.TRUE, "true"));
        reserve(new Word(Tag.FALSE, "false"));
    }

    void reserve(Word word_obj){
        words.put(word_obj.lexeme, word_obj);
    }

    char read_buf_char() throws IOException {
        char x = (char)System.in.read();
        return x;
    }

    /*tokenization done here*/
    public Token scan()throws IOException{


        for(; ; ){
            // while exiting the loop, sometime the comment
            // characters are read e.g. in bulldoz//er, 
            // which is lost if the buffer is read;
            // so read the buffer i
            peek = read_buf_char();
            if(peek == null_init||peek == tab){
                peek = read_buf_char();
                System.out.println("space is read");
            }else if(peek==newline){
                peek = read_buf_char();
                line +=1;
            }
            else{
                break;
            }
        }

        if(Character.isDigit(peek)){
            int v = 0;
            do{
                v = 10*v+Character.digit(peek, 10);
                peek = read_buf_char();
            }while(Character.isDigit(peek));
            return new Num(v);
        }

        if(Character.isLetter(peek)){
            StringBuffer b = new StringBuffer(32);
            do{
                b.append(peek);
                peek = read_buf_char();
            }while(Character.isLetterOrDigit(peek));

            String buffer_string = b.toString();
            Word reserved_word = (Word)words.get(buffer_string);//returns null if not found

            if(reserved_word != null){
                return reserved_word;
            }

            reserved_word = new Word(Tag.ID, buffer_string);
            // put key value pair in words hashtble
            words.put(buffer_string, reserved_word);
            return reserved_word;
        }

        // if character read is not a digit or a letter,
        // then the character read is a new token

        Token t = new Token(peek);
        peek = ' ';
        return t;

    }

    private char get_peek(){
        return (char)this.peek;
    }

    private boolean reached_buf_end(){
        // reached end of buffer
        if(this.get_peek() == (char)-1){
            return true;
        }
        return false;
    }

    public void run_test()throws IOException{
        //loop checking variable
        //a token object is initialized with dummy value
        Token new_token = null;
        // while end of stream has not been reached
        while(this.get_peek() != (char)-1){
            new_token = this.scan();

        }

        System.out.println(words.entrySet());
    }


    public static void main(String[] args)throws IOException{
        Lexer tokenize = new Lexer();
        tokenize.run_test();
    }

}

The get_peek function gets the value of peek which has current input buffer character.
The check for if the buffer end is reached is done in the run_test function.
The main processing is done in the scan() function.

I used the following command: cat delete.txt|java lexer/Lexer to provide the file as input to the compiled java class. Please tell me how is it that this code with the input file with newline added is going on an infinite loop?

I am not sure how you are checking for the end of stream (-1). At the end of scan() you are assigning "peek" to space, I think this is messing up when you have a blank line, you are not able to catch -1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM