从Java中的标准输入读取字符串和原始字节

Question

I have a program which will be receiving information from an external source via System.in. 我有一个程序，它将通过System.in从外部源接收信息。 There are two input modes: line mode and raw mode. 有两种输入模式：行模式和原始模式。 During line mode, input is simply a series of UTF-8 strings, each terminated with a line feed character. 在换行模式下，输入仅是一系列UTF-8字符串，每个字符串都以换行符终止。 At some point while in line mode, I will receive notification that I am about to receive N bytes of raw data. 在联机模式下的某个时刻，我将收到有关即将接收N字节原始数据的通知。 At that point the input switches to raw mode and I receive exactly N bytes of raw binary data, which are not valid UTF-8 characters. 那时，输入切换到原始模式，我收到了N个字节的原始二进制数据，这些字节不是有效的UTF-8字符。 After this point, it returns to line mode. 此后，它将返回线路模式。

Is there a way to easily switch between reading strings and reading raw data? 有没有一种方法可以轻松地在读取字符串和读取原始数据之间切换？ My only thought is to read an InputStream byte by byte and translate to characters as I go. 我唯一的想法是逐字节读取InputStream并将其转换为字符。 Are there any ways to wrap System.in with multiple types of input streams? 有什么方法可以用多种类型的输入流包装System.in？ I feel like reading from two different wrappers would cause problems. 我觉得从两个不同的包装中读取内容会引起问题。

(FIXED) Update: （已修复）更新：

I tried parsifal's suggestion, but am running into a problem. 我尝试过parsifal的建议，但遇到了问题。 To simulate the switching input modes, I modified my test harness. 为了模拟切换输入模式，我修改了测试线束。 (I realized that another process I have will eventually need to output this way as well.) I don't know if the problem is caused by the send or receive end. （我意识到我最终拥有的另一个进程也将需要以这种方式输出。）我不知道问题是由发送端还是接收端引起的。 When I switch between output modes, it doesn't seem to be reading in the bytes properly. 当我在输出模式之间切换时，似乎无法正确读取字节。 Also, it's always the same byte values that appear. 同样，它总是显示相同的字节值。 Here are some code excerpts: 以下是一些代码摘录：

FIX: The problem was that apparently you can't switch from the OutputStreamWriter to OutputStream too quickly. FIX：问题是，显然您不能太快地从OutputStreamWriter切换到OutputStream。 I added a 1ms sleep command before sending the raw bytes, and the problem is solved! 我在发送原始字节之前添加了一个1ms的sleep命令，问题已解决！

Test Harness: 测试线束：

Process p = processList.get(pubName); //Stored list of started Processes
OutputStream o = p.getOutputStream(); //Returns OutputStream which feeds into stdin
out = new OutputStreamWriter(runPublisher.getOutputStream());

byte[] payload = new byte[25];
out.write("\nPAYLOAD\nRAW\n"); // "RAW\n" signals raw mode
out.write(String.valueOf(payload.length) + "\n");
out.flush();
Thread.sleep(1); //This fixed the problem I was having.
System.out.println(Arrays.toString(payload));
o.write(payload);
o.flush();

Client: 客户：

InputStreamReader inReader = new InputStreamReader(System.in);

while(true){
    try{
        if((chIn = inReader.read())!= -1){
            if(chIn == (int)'\n'){
                if(rawMode){
                    if(strIn.equals("ENDRAW"))
                        rawMode = false;
                    else{
                        System.out.println(strIn);
                        //Exception on next line
                        int rawSize = Integer.parseInt(strIn);
                        payload = new byte[rawSize];
                        int t = System.in.read(payload);
                        System.out.println("Read " + t + " bytes");
                        System.out.print(Arrays.toString(payload));
                    }
                }else if(strIn.startsWith("RAW")){
                    rawMode = true;
                }else {
                    // Do other things
                }
                strIn = "";
            }else
                strIn += (char)chIn;
        }else
            break;
    }catch(IOException e){break;}
}

And the outputs (prior to adding Sleep statement) look like this: 输出（在添加Sleep语句之前）如下所示：

Test Harness: 测试线束：
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] [1，1，1，1，1，1，1，1，1，1，1，1，1，1，1，1，1，1，1，1，1，1，1，1，1 ]

Client: 客户：
25 25
Read 9 bytes 读取9个字节
[83, 72, 85, 84, 68, 79, 87, 78, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [83、72、85、84、68、79、87、78、10、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0 ]

Exception in thread "main" java.lang.NumberFormatException: For input string: "
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:470)
    at java.lang.Integer.parseInt(Integer.java:514)
    at myClass.handleCommand(myClass.java:249)

Answer 1

You can wrap System.in with an InputStreamReader that specifies "utf-8" encoding, and then read character-by-character. 您可以使用指定“ utf-8”编码的InputStreamReader包装System.in ，然后逐个字符读取。 Accumulate characters into a StringBuilder and dispatch whenever appropriate (nomially when you see '\\n' , but possibly based on a test of the builder). 将字符累积到StringBuilder并在适当的时候分派（通常，当您看到'\\n' ，但可能基于对构建器的测试）。

When you want to read binary data, just read from the underlying InputStream ( System.in ). 当您想读取二进制数据时，只需从基础InputStream （ System.in ）中读取。 The InputStreamReader performs translation as-needed, and does not buffer data. InputStreamReader根据需要执行转换，并且不缓冲数据。

You do not want to use any sort of buffered stream or reader in the stack. 你不希望使用任何类型的缓冲流或阅读器的堆栈。 This will eliminate any opportunity to use a readLine() method, at least if you confine yourself to the JDK classes. 至少在将自己局限于JDK类的情况下，这将消除使用readLine()方法的任何机会。

Edit based on your latest updates: 根据您的最新更新进行编辑：

I think that your switching between raw and cooked mode is a bit suspicious. 我认为您在未加工和煮熟模式之间切换有点可疑。 If I were to implement this, I'd create two primitive operations, String readLine() and byte[] readData(length) . 如果要实现这一点，我将创建两个基本操作， String readLine()和byte[] readData(length) 。 The first accumulates characters up to a newline, the second reads a fixed buffer. 第一个累积字符直到换行符，第二个读取固定缓冲区。 Then your main loop looks something like this: 然后，您的主循环如下所示：

InputStream in = // ...
Reader rd = new InputStreamReader(in, "USASCII");  // or whatever encoding you use

while (true) {
    String command = readLine(rd );
    if (command .equals("RAW")) {
        int length = Integer.parseInt(readLine(rd ));
        byte[] data = readData(in , length);
        if (! readLine(rd ).equals("ENDRAW")) {
            throw // an exception that indicates protocol violation
        }
    }
    else // process other commands
}

I would also wrap the whole thing up in an object, which is constructed around the stream, and perhaps uses callbacks to dispatch the data packets. 我还将整个内容包装在一个对象中，该对象围绕流构建，并可能使用回调来分派数据包。

Answer 2

the best bet is probably to just read byte-by-byte (using System.in.read() )into a buffer until you hit the UTF-8 line feed byte 0x0A, then translate that byte buffer into a string (using new String(byte[] bytes, "UTF-8") ). 最好的选择是将字节（使用System.in.read() ）读入缓冲区，直到命中UTF-8换行字节0x0A，然后将该字节缓冲区转换为字符串（使用new String(byte[] bytes, "UTF-8") ）。

note that read() called on a InputStream will return an int with a value from 0 to 255, you'll need to cast it into a byte. 请注意，在InputStream上调用read()会返回一个整数，其值从0到255，您需要将其强制转换为一个字节。 You can accumulate bytes in a Collection of some sort, then use standard Collection framework tools to convert it to an array for consumption by the String constructor. 您可以在某种Collection中累积字节，然后使用标准Collection框架工具将其转换为数组以供String构造函数使用。

When you see the indicator that its going to switch over (presumably some sort of in-stream signalling, certain specific bytes), then switch to your raw byte reading code. 当您看到其即将切换的指示符（可能是某种流内信令，某些特定字节），然后切换到原始字节读取代码。

从Java中的标准输入读取字符串和原始字节

问题描述

2 个解决方案

解决方案1
3 已采纳 2013-01-07 17:47:48

解决方案2
1 2013-01-07 17:40:49

从Java中的标准输入读取字符串和原始字节

问题描述

2 个解决方案

解决方案1 3 已采纳 2013-01-07 17:47:48

解决方案2 1 2013-01-07 17:40:49

解决方案1
3 已采纳 2013-01-07 17:47:48

解决方案2
1 2013-01-07 17:40:49