[英]How do newlines affect System.in.read() in java
我正在尝试制作一个词法分析器类,该类主要对输入流字符进行标记化,然后使用System.in.read()
读取字符。 该文档说,当到达流的末尾时,它returns -1
,但是,当它具有不同的输入时,此行为有何不同,我无法理解。 例如, delete.txt
具有输入:
1. I have
2. bulldoz//er
然后, Lexer
将正确的标记化为:
[I=257, have=257, false=259, er=257, bulldoz=257, true=258]
但是现在如果我使用enter
插入一些空行,代码将进入无限循环,代码将检查换行符和空格以进行输入,但是,如何绕过输入呢? :
1. I have
2. bulldoz//er
3.
完整的代码是:
package lexer;
import java.io.*;
import java.util.*;
import lexer.Token;
import lexer.Num;
import lexer.Tag;
import lexer.Word;
class Lexer{
public int line = 1;
private char null_init = ' ';
private char tab = '\t';
private char newline = '\n';
private char peek = null_init;
private char comment1 = '/';
private char comment2 = '*';
private Hashtable<String, Word> words = new Hashtable<>();
//no-args constructor
public Lexer(){
reserve(new Word(Tag.TRUE, "true"));
reserve(new Word(Tag.FALSE, "false"));
}
void reserve(Word word_obj){
words.put(word_obj.lexeme, word_obj);
}
char read_buf_char() throws IOException {
char x = (char)System.in.read();
return x;
}
/*tokenization done here*/
public Token scan()throws IOException{
for(; ; ){
// while exiting the loop, sometime the comment
// characters are read e.g. in bulldoz//er,
// which is lost if the buffer is read;
// so read the buffer i
peek = read_buf_char();
if(peek == null_init||peek == tab){
peek = read_buf_char();
System.out.println("space is read");
}else if(peek==newline){
peek = read_buf_char();
line +=1;
}
else{
break;
}
}
if(Character.isDigit(peek)){
int v = 0;
do{
v = 10*v+Character.digit(peek, 10);
peek = read_buf_char();
}while(Character.isDigit(peek));
return new Num(v);
}
if(Character.isLetter(peek)){
StringBuffer b = new StringBuffer(32);
do{
b.append(peek);
peek = read_buf_char();
}while(Character.isLetterOrDigit(peek));
String buffer_string = b.toString();
Word reserved_word = (Word)words.get(buffer_string);//returns null if not found
if(reserved_word != null){
return reserved_word;
}
reserved_word = new Word(Tag.ID, buffer_string);
// put key value pair in words hashtble
words.put(buffer_string, reserved_word);
return reserved_word;
}
// if character read is not a digit or a letter,
// then the character read is a new token
Token t = new Token(peek);
peek = ' ';
return t;
}
private char get_peek(){
return (char)this.peek;
}
private boolean reached_buf_end(){
// reached end of buffer
if(this.get_peek() == (char)-1){
return true;
}
return false;
}
public void run_test()throws IOException{
//loop checking variable
//a token object is initialized with dummy value
Token new_token = null;
// while end of stream has not been reached
while(this.get_peek() != (char)-1){
new_token = this.scan();
}
System.out.println(words.entrySet());
}
public static void main(String[] args)throws IOException{
Lexer tokenize = new Lexer();
tokenize.run_test();
}
}
get_peek
函数获取具有当前输入缓冲区字符的peek
的值。
在run_test
函数中检查是否到达缓冲区末端。
主要处理在scan()
函数中完成。
我使用以下命令: cat delete.txt|java lexer/Lexer
将文件提供为已编译Java类的输入。 请告诉我,此代码与添加了换行符的输入文件一起进行的是无限循环吗?
我不确定您如何检查流(-1)的结尾。 在scan()的末尾,您正在为空间分配“窥视”,我认为如果有空白行,您将无法捕捉-1,这将变得很混乱。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.