[英]How do newlines affect System.in.read() in java
I'm trying to make a lexical analyzer class, that mostly tokenizes the input stream characters, and I use System.in.read()
to read characters. 我正在尝试制作一个词法分析器类,该类主要对输入流字符进行标记化,然后使用
System.in.read()
读取字符。 The doc says that it returns -1
when end of stream is reached, but, how is this behaviour different when it has different input, I cannot understand this. 该文档说,当到达流的末尾时,它
returns -1
,但是,当它具有不同的输入时,此行为有何不同,我无法理解。 For eg delete.txt
has the input: 例如,
delete.txt
具有输入:
1. I have
2. bulldoz//er
Then the Lexer
has correct tokenization as: 然后,
Lexer
将正确的标记化为:
[I=257, have=257, false=259, er=257, bulldoz=257, true=258]
but now if I insert some blank lines using enter
then, the code goes on an infinite loop, the code checks newlines and spaces for input, yet, how does it get bypassed? 但是现在如果我使用
enter
插入一些空行,代码将进入无限循环,代码将检查换行符和空格以进行输入,但是,如何绕过输入呢? : :
1. I have
2. bulldoz//er
3.
The full code is: 完整的代码是:
package lexer;
import java.io.*;
import java.util.*;
import lexer.Token;
import lexer.Num;
import lexer.Tag;
import lexer.Word;
class Lexer{
public int line = 1;
private char null_init = ' ';
private char tab = '\t';
private char newline = '\n';
private char peek = null_init;
private char comment1 = '/';
private char comment2 = '*';
private Hashtable<String, Word> words = new Hashtable<>();
//no-args constructor
public Lexer(){
reserve(new Word(Tag.TRUE, "true"));
reserve(new Word(Tag.FALSE, "false"));
}
void reserve(Word word_obj){
words.put(word_obj.lexeme, word_obj);
}
char read_buf_char() throws IOException {
char x = (char)System.in.read();
return x;
}
/*tokenization done here*/
public Token scan()throws IOException{
for(; ; ){
// while exiting the loop, sometime the comment
// characters are read e.g. in bulldoz//er,
// which is lost if the buffer is read;
// so read the buffer i
peek = read_buf_char();
if(peek == null_init||peek == tab){
peek = read_buf_char();
System.out.println("space is read");
}else if(peek==newline){
peek = read_buf_char();
line +=1;
}
else{
break;
}
}
if(Character.isDigit(peek)){
int v = 0;
do{
v = 10*v+Character.digit(peek, 10);
peek = read_buf_char();
}while(Character.isDigit(peek));
return new Num(v);
}
if(Character.isLetter(peek)){
StringBuffer b = new StringBuffer(32);
do{
b.append(peek);
peek = read_buf_char();
}while(Character.isLetterOrDigit(peek));
String buffer_string = b.toString();
Word reserved_word = (Word)words.get(buffer_string);//returns null if not found
if(reserved_word != null){
return reserved_word;
}
reserved_word = new Word(Tag.ID, buffer_string);
// put key value pair in words hashtble
words.put(buffer_string, reserved_word);
return reserved_word;
}
// if character read is not a digit or a letter,
// then the character read is a new token
Token t = new Token(peek);
peek = ' ';
return t;
}
private char get_peek(){
return (char)this.peek;
}
private boolean reached_buf_end(){
// reached end of buffer
if(this.get_peek() == (char)-1){
return true;
}
return false;
}
public void run_test()throws IOException{
//loop checking variable
//a token object is initialized with dummy value
Token new_token = null;
// while end of stream has not been reached
while(this.get_peek() != (char)-1){
new_token = this.scan();
}
System.out.println(words.entrySet());
}
public static void main(String[] args)throws IOException{
Lexer tokenize = new Lexer();
tokenize.run_test();
}
}
The get_peek
function gets the value of peek
which has current input buffer character. get_peek
函数获取具有当前输入缓冲区字符的peek
的值。
The check for if the buffer end is reached is done in the run_test
function. 在
run_test
函数中检查是否到达缓冲区末端。
The main processing is done in the scan()
function. 主要处理在
scan()
函数中完成。
I used the following command: cat delete.txt|java lexer/Lexer
to provide the file as input to the compiled java class. 我使用以下命令:
cat delete.txt|java lexer/Lexer
将文件提供为已编译Java类的输入。 Please tell me how is it that this code with the input file with newline added is going on an infinite loop? 请告诉我,此代码与添加了换行符的输入文件一起进行的是无限循环吗?
I am not sure how you are checking for the end of stream (-1). 我不确定您如何检查流(-1)的结尾。 At the end of scan() you are assigning "peek" to space, I think this is messing up when you have a blank line, you are not able to catch -1.
在scan()的末尾,您正在为空间分配“窥视”,我认为如果有空白行,您将无法捕捉-1,这将变得很混乱。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.