简体   繁体   English

如何从包含文本标题的套接字输入流中读取二进制文件?:

[英]How can I read a binary file from a socket input stream which includes textual headers?:

OK, so I have a socket InputStream over which the server is sending a data stream containing a number of lines of header text followed by a binary stream of bytes making up the pdf file it is sending (of a length specified in the header section). 好的,所以我有一个套接字InputStream服务器在其上发送一个数据流,该数据流包含多行标题文本,后面跟着一个二进制字节流,组成它发送的pdf文件(标题部分指定的长度) 。 The server, which I can't control, does not close the data stream after it sends its data so i must read the exact amount of bytes from the stream and then close it myself from the client end. 我无法控制的服务器在发送数据后不会关闭数据流,所以我必须从流中读取确切的字节数,然后自己从客户端关闭它。

So, my question is, how do you or are there any utilities which will allow me to easily read the headers (as text) and then read an exact amount of bytes from the same input sream? 所以,我的问题是,你或者是否有任何实用工具可以让我轻松读取标题(如文本),然后从同一输入sream中读取确切的字节数?

I've tried various Reader classes which work great for the headers, but as I've learned not so great for the binary content of the data ( Reader s work with characters not bytes). 我已经尝试了各种适用于标题的Reader类,但是我已经学会了对数据的二进制内容不那么好( Reader使用字符而不是字节)。 Utilities such as apache commons IOUtils don't work for me because the stream remains open/unterminated and attempts at IOUtils.toBytes(inputStream) hang indefinitely. 诸如apache commons IOUtils实用程序对我来说不起作用,因为流保持打开/未IOUtils.toBytes(inputStream)并且IOUtils.toBytes(inputStream)尝试无限期地挂起。

The solution seems to be to work with Stream classes rather than Reader classes, but it seems so low level that there must be utilities out there to help me with this. 解决方案似乎是使用Stream类而不是Reader类,但它似乎是如此之低,以至于必须有实用程序来帮助我。 Reading the binary data using a DataInputStream seems easy enough, but I'm stumped as to how to read the headers. 使用DataInputStream读取二进制数据似乎很容易,但我对如何读取标题感到困惑。 Any advice? 有什么建议?

EDIT: Here is a sample message: 编辑:这是一个示例消息:

http/1.0 200 OK
content-type: application/doc_request
content-length: 18813
session-id: slukdcy71292645678312
remote-addr: slukdcy7

<pdf binary data...>

The new line between the headers and binary data determine the end of the headers and the start of the binary data. 标头和二进制数据之间的新行确定标头的结尾和二进制数据的开始。

You can convert binary bytes to text. 您可以将二进制字节转换为文本。 I suggest you read all the data as binary and convert the header to text from the binary for the header. 我建议您将所有数据读取为二进制文件,并将标题转换为二进制文件中的标题。

EDIT: here is a sample solution. 编辑:这是一个示例解决方案。 It assumes that all headers are as you suggested and the files are small enough to fit into memory. 它假定所有标题都符合您的建议,并且文件足够小以适应内存。 You may want to buffer your input stream. 您可能希望缓冲输入流。

public class HttpFile {
    public final String status;
    public final Map<String, String> properties;
    public final byte[] data;

    public HttpFile(String status, Map<String, String> properties, byte[] data) {
        this.status = status;
        this.properties = properties;
        this.data = data;
    }

    public static HttpFile readFrom(DataInputStream dis, Charset charset) throws IOException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        int ch;
        while((ch = dis.read()) != -1) {
            baos.write(ch);
            if (ch == '\n') {
                ch = dis.read();
                // the second newline??
                if (ch == '\n')
                    break;
                baos.write(ch);
            }
        }
        String header = new String(baos.toByteArray(), charset);
        String[] lines = header.split("\\n");
        String status = lines[0];
        Map<String, String> properties = new LinkedHashMap<String, String>();
        for(int i=1;i<lines.length;i++) {
            String[] keyValue = lines[i].split(": ",2);
            properties.put(keyValue[0], keyValue[1]);
        }
        byte[] data = null;
        String content_length = properties.get("context-length");
        if (content_length != null) {
            int length = Integer.parseInt(content_length);
            dis.readFully(data = new byte[length]);
        }
        return new HttpFile(status, properties, data);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM