简体   繁体   English

从Java中的标准输入读取字符串和原始字节

[英]Read both strings and raw bytes from standard input in java

I have a program which will be receiving information from an external source via System.in. 我有一个程序,它将通过System.in从外部源接收信息。 There are two input modes: line mode and raw mode. 有两种输入模式:行模式和原始模式。 During line mode, input is simply a series of UTF-8 strings, each terminated with a line feed character. 在换行模式下,输入仅是一系列UTF-8字符串,每个字符串都以换行符终止。 At some point while in line mode, I will receive notification that I am about to receive N bytes of raw data. 在联机模式下的某个时刻,我将收到有关即将接收N字节原始数据的通知。 At that point the input switches to raw mode and I receive exactly N bytes of raw binary data, which are not valid UTF-8 characters. 那时,输入切换到原始模式,我收到了N个字节的原始二进制数据,这些字节不是有效的UTF-8字符。 After this point, it returns to line mode. 此后,它将返回线路模式。

Is there a way to easily switch between reading strings and reading raw data? 有没有一种方法可以轻松地在读取字符串和读取原始数据之间切换? My only thought is to read an InputStream byte by byte and translate to characters as I go. 我唯一的想法是逐字节读取InputStream并将其转换为字符。 Are there any ways to wrap System.in with multiple types of input streams? 有什么方法可以用多种类型的输入流包装System.in? I feel like reading from two different wrappers would cause problems. 我觉得从两个不同的包装中读取内容会引起问题。

(FIXED) Update: (已修复)更新:

I tried parsifal's suggestion, but am running into a problem. 我尝试过parsifal的建议,但遇到了问题。 To simulate the switching input modes, I modified my test harness. 为了模拟切换输入模式,我修改了测试线束。 (I realized that another process I have will eventually need to output this way as well.) I don't know if the problem is caused by the send or receive end. (我意识到我最终拥有的另一个进程也将需要以这种方式输出。)我不知道问题是由发送端还是接收端引起的。 When I switch between output modes, it doesn't seem to be reading in the bytes properly. 当我在输出模式之间切换时,似乎无法正确读取字节。 Also, it's always the same byte values that appear. 同样,它总是显示相同的字节值。 Here are some code excerpts: 以下是一些代码摘录:

FIX: The problem was that apparently you can't switch from the OutputStreamWriter to OutputStream too quickly. FIX:问题是,显然您不能太快地从OutputStreamWriter切换到OutputStream。 I added a 1ms sleep command before sending the raw bytes, and the problem is solved! 我在发送原始字节之前添加了一个1ms的sleep命令,问题已解决!

Test Harness: 测试线束:

Process p = processList.get(pubName); //Stored list of started Processes
OutputStream o = p.getOutputStream(); //Returns OutputStream which feeds into stdin
out = new OutputStreamWriter(runPublisher.getOutputStream());

byte[] payload = new byte[25];
out.write("\nPAYLOAD\nRAW\n"); // "RAW\n" signals raw mode
out.write(String.valueOf(payload.length) + "\n");
out.flush();
Thread.sleep(1); //This fixed the problem I was having.
System.out.println(Arrays.toString(payload));
o.write(payload);
o.flush();

Client: 客户:

InputStreamReader inReader = new InputStreamReader(System.in);

while(true){
    try{
        if((chIn = inReader.read())!= -1){
            if(chIn == (int)'\n'){
                if(rawMode){
                    if(strIn.equals("ENDRAW"))
                        rawMode = false;
                    else{
                        System.out.println(strIn);
                        //Exception on next line
                        int rawSize = Integer.parseInt(strIn);
                        payload = new byte[rawSize];
                        int t = System.in.read(payload);
                        System.out.println("Read " + t + " bytes");
                        System.out.print(Arrays.toString(payload));
                    }
                }else if(strIn.startsWith("RAW")){
                    rawMode = true;
                }else {
                    // Do other things
                }
                strIn = "";
            }else
                strIn += (char)chIn;
        }else
            break;
    }catch(IOException e){break;}
}

And the outputs (prior to adding Sleep statement) look like this: 输出(在添加Sleep语句之前)如下所示:

Test Harness: 测试线束:
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 ]

Client: 客户:
25 25
Read 9 bytes 读取9个字节
[83, 72, 85, 84, 68, 79, 87, 78, 10, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] [83、72、85、84、68、79、87、78、10、0、0、0、0、0、0、0、0、0、0、0、0、0、0、0 ]

Exception in thread "main" java.lang.NumberFormatException: For input string: "
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
    at java.lang.Integer.parseInt(Integer.java:470)
    at java.lang.Integer.parseInt(Integer.java:514)
    at myClass.handleCommand(myClass.java:249)

You can wrap System.in with an InputStreamReader that specifies "utf-8" encoding, and then read character-by-character. 您可以使用指定“ utf-8”编码的InputStreamReader包装System.in ,然后逐个字符读取。 Accumulate characters into a StringBuilder and dispatch whenever appropriate (nomially when you see '\\n' , but possibly based on a test of the builder). 将字符累积到StringBuilder并在适当的时候分派(通常,当您看到'\\n' ,但可能基于对构建器的测试)。

When you want to read binary data, just read from the underlying InputStream ( System.in ). 当您想读取二进制数据时,只需从基础InputStreamSystem.in )中读取。 The InputStreamReader performs translation as-needed, and does not buffer data. InputStreamReader根据需要执行转换,并且不缓冲数据。

You do not want to use any sort of buffered stream or reader in the stack. 希望使用任何类型的缓冲流或阅读器的堆栈。 This will eliminate any opportunity to use a readLine() method, at least if you confine yourself to the JDK classes. 至少在将自己局限于JDK类的情况下,这将消除使用readLine()方法的任何机会。


Edit based on your latest updates: 根据您的最新更新进行编辑:

I think that your switching between raw and cooked mode is a bit suspicious. 我认为您在未加工和煮熟模式之间切换有点可疑。 If I were to implement this, I'd create two primitive operations, String readLine() and byte[] readData(length) . 如果要实现这一点,我将创建两个基本操作, String readLine()byte[] readData(length) The first accumulates characters up to a newline, the second reads a fixed buffer. 第一个累积字符直到换行符,第二个读取固定缓冲区。 Then your main loop looks something like this: 然后,您的主循环如下所示:

InputStream in = // ...
Reader rd = new InputStreamReader(in, "USASCII");  // or whatever encoding you use

while (true) {
    String command = readLine(rd );
    if (command .equals("RAW")) {
        int length = Integer.parseInt(readLine(rd ));
        byte[] data = readData(in , length);
        if (! readLine(rd ).equals("ENDRAW")) {
            throw // an exception that indicates protocol violation
        }
    }
    else // process other commands
}

I would also wrap the whole thing up in an object, which is constructed around the stream, and perhaps uses callbacks to dispatch the data packets. 我还将整个内容包装在一个对象中,该对象围绕流构建,并可能使用回调来分派数据包。

the best bet is probably to just read byte-by-byte (using System.in.read() )into a buffer until you hit the UTF-8 line feed byte 0x0A, then translate that byte buffer into a string (using new String(byte[] bytes, "UTF-8") ). 最好的选择是将字节(使用System.in.read() )读入缓冲区,直到命中UTF-8换行字节0x0A,然后将该字节缓冲区转换为字符串(使用new String(byte[] bytes, "UTF-8") )。

note that read() called on a InputStream will return an int with a value from 0 to 255, you'll need to cast it into a byte. 请注意,在InputStream上调用read()会返回一个整数,其值从0到255,您需要将其强制转换为一个字节。 You can accumulate bytes in a Collection of some sort, then use standard Collection framework tools to convert it to an array for consumption by the String constructor. 您可以在某种Collection中累积字节,然后使用标准Collection框架工具将其转换为数组以供String构造函数使用。

When you see the indicator that its going to switch over (presumably some sort of in-stream signalling, certain specific bytes), then switch to your raw byte reading code. 当您看到其即将切换的指示符(可能是某种流内信令,某些特定字节),然后切换到原始字节读取代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM