简体   繁体   English

有没有办法检测传入的序列化对象流是压缩的GZIPOutputStream还是简单的ObjectOutputStream?

[英]Is there a way to detect if an incoming serialized object stream is GZIPOutputStream compressed or a simple ObjectOutputStream?

I have a legacy system where servers get slowly updated over a period of weeks. 我有一个遗留系统,服务器会在几周内缓慢更新。 The hierarchy is such: 层次结构是这样的:

1
2
3 4 5

1 is the client pc
2 is a master server
3 4 and 5 are servers across the country.

Currently all of these are sending POJO (plain old java objects) back and forth in an uncompressed format. 目前所有这些都是以未压缩格式来回发送POJO(普通的旧Java对象)。 Think OjbectOutputStream() etc. 想想OjbectOutputStream()等。

I'd like to compress the data being serialized over the wire but do it in such a way that only data being received from a query is compressed. 我想压缩通过网络序列化的数据,但这样做的方式是只压缩从查询接收的数据。 The data being sent down is trivial (query filter data). 发送的数据是微不足道的(查询过滤器数据)。

Only client #1 and master server #2 are updated right away. 只有客户端#1和主服务器#2立即更新。 Servers #3, #4 and #5 could be updated weeks or months apart from each other. 服务器#3,#4和#5可以相隔数周或数月更新。 I need a way for the server #2 to be able to detect whether the streams coming back from #3, #4 or #5 are compressed and deal with it accordingly (as they get upgraded). 我需要一种方法让服务器#2能够检测从#3,#4或#5返回的流是否被压缩并相应地处理它(当它们升级时)。

-EDIT- The solution must be unobtrusive for the servers #3, #4, and #5. -EDIT-对于服务器#3,#4和#5,解决方案必须不引人注目。 These servers do not have the concept of resending the data if an exception occurs. 如果发生异常,这些服务器不具有重新发送数据的概念。

Here is an example of code used by #2 to communicate with #3, #4, or #5: 以下是#2用于与#3,#4或#5通信的代码示例:

    // Set the content type to be application/x-java-serialized-object
    connection.setRequestProperty("Content-Type", "application/x-java-serialized-object");

    setupHeaderAttributes(getHttpHeaders());

    setupSessionCookies(getHttpHeaders());

    // Load/add httpHeaders
    addHeadersToConnection(connection, getHttpHeaders());

    // Write the serialized object as post data
    objectoutputstream = new ObjectOutputStream(connection.getOutputStream());
    objectoutputstream.writeObject(obj);
    objectoutputstream.flush();

    // Get ready to receive the reply.
    inputstream = connection.getInputStream();
    setHttpStatus(connection.getResponseCode());

Is this possible? 这可能吗? Thank you for your time. 感谢您的时间。

-Dennis 丹尼斯

You can read in the stream's header. 您可以在流的标题中读取。 GZIPOutputStream writes the GZIP header into it the stream before anything else, and in lexical hex, it looks like: GZIPOutputStream在其他任何内容之前将GZIP标头写入流中,在词法十六进制中,它看起来像:

0x1f8b 0800 0000 0000 0000

Source 资源

Note that if your legacy servers aren't using Java's GZIPOutputStream , the last 8 bytes may be different. 请注意,如果旧版服务器未使用Java的GZIPOutputStream ,则最后8个字节可能不同。 However, the first 2 bytes will always be 0x1f8b . 但是, 前2个字节将始终为0x1f8b The remaining header values are just information about where it came from and some other flags used by the GZIP format. 其余的标头值只是关于它来自何处以及GZIP格式使用的其他标志的信息。

@Puce has half the answer. @Puce有一半的答案。 The other half is to use mark() and reset() to reset the stream if it's not a GZipped stream: 另一半是使用mark()reset()来重置流,如果它不是GZipped流:

    InputStream in = // stream from server
    in = new BufferedInputStream(in);
    in.mark(1024);

    try {
        in = new GZIPInputStream(in);
    }
    catch (ZipException ex) {
        in.reset();
    }

    // "in" is now ready for use

The BufferedInputStream serves two purposes here: first, I know that it supports mark/reset. BufferedInputStream在这里有两个目的:首先,我知道它支持标记/重置。 Second, it will improve IO performance if the underlying stream is not buffered (although, if it's a socket stream, it will be). 其次,如果底层流没有缓冲,它将提高IO性能(尽管如果它是套接字流,它将是)。

The mark value of 1024 is an arbitrary value. 标记值1024是任意值。 The GZipInputStream constructor should be able to determine if the underlying stream is GZipped by reading the first two characters. GZipInputStream构造函数应该能够通过读取前两个字符来确定底层流是否已被GZip化。 However, a GZIP header is 10 bytes long, so it might try to read more. 但是,GZIP标头长度为10个字节,因此可能会尝试读取更多字节。 It shouldn't read more than 1024 bytes (and if it does, increase the mark value). 它不应该读取超过1024个字节(如果是,则增加标记值)。


Edit: Since I see from your edit that you're using the Content-Type header, you could also use it to optionally unzip the stream: the new servers would return something like x-application/java-gzipped-serialized-object while the old servers continue to return x-application/serialized-java-object (or whatever it was). 编辑:因为我从您的编辑中看到您正在使用Content-Type标头,您还可以使用它来选择性地解压缩流:新服务器将返回类似x-application/java-gzipped-serialized-object而旧服务器继续返回x-application/serialized-java-object (或其他任何东西)。

Content types beginning with "x-" are unrestricted; 以“x-”开头的内容类型不受限制; you can use whatever you want, as long as both ends agree. 只要两端都同意,你就可以使用你想要的任何东西。

The constructors of GZIPInputStream throw a ZipException if they cannot handle the input stream. 如果GZIPInputStream构造函数无法处理输入流,则抛出ZipException。

ZipException - if a GZIP format error has occurred or the compression method used is unsupported ZipException - 如果发生GZIP格式错误或不支持使用的压缩方法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM