简体   繁体   English

在 Java 中将文件的前 N ​​个字节作为 InputStream 读取?

[英]Reading first N bytes of a file as an InputStream in Java?

For the life of me, I haven't been able to find a question that matches what I'm trying to do, so I'll explain what my use-case is here.对于我的一生,我一直无法找到与我想要做的事情相匹配的问题,所以我将在这里解释我的用例。 If you know of a topic that already covers the answer to this, please feel free to direct me to that one.如果您知道某个主题已经涵盖了该问题的答案,请随时将我引向该主题。 :) :)

I have a piece of code that uploads a file to Amazon S3 periodically (every 20 seconds).我有一段代码可以定期(每 20 秒)将文件上传到 Amazon S3。 The file is a log file being written by another process, so this function is effectively a means of tailing the log so that someone can read its contents in semi-real-time without having to have direct access to the machine that the log resides on.该文件是由另一个进程写入的日志文件,因此此功能实际上是一种拖尾日志的手段,以便有人可以半实时地读取其内容,而无需直接访问日志所在的机器.

Up until recently, I've simply been using the S3 PutObject method (using a File as input) to do this upload.直到最近,我一直只是使用 S3 PutObject 方法(使用文件作为输入)来执行此上传。 But in AWS SDK 1.9, this no longer works because the S3 client rejects the request if the content size actually uploaded is greater than the content-length that was promised at the start of the upload.但在 AWS SDK 1.9 中,这不再有效,因为如果实际上传的内容大小大于上传开始时承诺的内容长度,S3 客户端将拒绝请求。 This method reads the size of the file before it starts streaming the data, so given the nature of this application, the file is very likely to have increased in size between that point and the end of the stream.此方法在开始流式传输数据之前读取文件的大小,因此鉴于此应用程序的性质,文件很可能在该点和流末尾之间增加了大小。 This means that I need to now ensure I only send N bytes of data regardless of how big the file is.这意味着我现在需要确保我只发送 N 个字节的数据,而不管文件有多大。

I don't have any need to interpret the bytes in the file in any way, so I'm not concerned about encoding.我不需要以任何方式解释文件中的字节,所以我不关心编码。 I can transfer it byte-for-byte.我可以逐字节传输它。 Basically, what I want is a simple method where I can read the file up to the Nth byte, then have it terminate the read even if there's more data in the file past that point.基本上,我想要的是一种简单的方法,我可以将文件读取到第 N 个字节,然后即使文件中有更多数据超过该点,也可以终止读取。 (In other words, insert EOF into the stream at a specific point.) (换句话说,在特定点将 EOF 插入到流中。)

For example, if my file is 10000 bytes long when I start the upload, but grows to 12000 bytes during the upload, I want to stop uploading at 10000 bytes regardless of that size change.例如,如果我开始上传时我的文件长 10000 字节,但在上传过程中增长到 12000 字节,我想在 10000 字节时停止上传,无论大小如何变化。 (On a subsequent upload, I would then upload the 12000 bytes or more.) (在随后的上传中,我将上传 12000 字节或更多。)

I haven't found a pre-made way to do this - the best I've found so far appears to be IOUtils.copyLarge(InputStream, OutputStream, offset, length), which can be told to copy a maximum of "length" bytes to the provided OutputStream.我还没有找到一种预制的方法来做到这一点 - 到目前为止我发现的最好的似乎是 IOUtils.copyLarge(InputStream, OutputStream, offset, length),它可以被告知复制最大的“长度”字节到提供的 OutputStream。 However, copyLarge is a blocking method, as is PutObject (which presumably calls a form of read() on its InputStream), so it seems that I couldn't get that to work at all.但是,copyLarge 是一种阻塞方法,PutObject(它大概在其 InputStream 上调用了一种 read() 形式)也是一种阻塞方法,所以似乎我根本无法让它工作。

I haven't found any methods or pre-built streams that can do this, so it's making me think I'd need to write my own implementation that directly monitors how many bytes have been read.我还没有找到任何可以做到这一点的方法或预构建的流,所以这让我觉得我需要编写自己的实现来直接监控已读取的字节数。 That would probably then work like a BufferedInputStream where the number of bytes read per batch is the lesser of the buffer size or the remaining bytes to be read.这可能会像 BufferedInputStream 一样工作,其中每批读取的字节数是缓冲区大小或要读取的剩余字节中的较小者。 (eg. with a buffer size of 3000 bytes, I'd do three batches at 3000 bytes each, followed by a batch with 1000 bytes + EOF.) (例如,缓冲区大小为 3000 字节,我会做三批,每批 3000 字节,然后是一批 1000 字节 + EOF。)

Does anyone know a better way to do this?有谁知道更好的方法来做到这一点? Thanks.谢谢。

EDIT Just to clarify, I'm already aware of a couple alternatives, neither of which are ideal:编辑只是为了澄清,我已经知道了几个替代方案,它们都不理想:

(1) I could lock the file while uploading it. (1) 我可以在上传文件时锁定文件。 Doing this would cause loss of data or operational problems in the process that's writing the file.这样做会在写入文件的过程中导致数据丢失或操作问题。

(2) I could create a local copy of the file before uploading it. (2) 我可以在上传文件之前创建文件的本地副本。 This could be very inefficient and take up a lot of unnecessary disk space (this file can grow into the several-gigabyte range, and the machine it's running on may be that short of disk space).这可能非常低效并占用大量不必要的磁盘空间(此文件可能会增长到几 GB 的范围,并且运行它的机器可能缺少磁盘空间)。

EDIT 2: My final solution, based on a suggestion from a coworker, looks like this:编辑 2:根据同事的建议,我的最终解决方案如下所示:

private void uploadLogFile(final File logFile) {
    if (logFile.exists()) {
        long byteLength = logFile.length();
        try (
            FileInputStream fileStream = new FileInputStream(logFile);
            InputStream limitStream = ByteStreams.limit(fileStream, byteLength);
        ) {
            ObjectMetadata md = new ObjectMetadata();
            md.setContentLength(byteLength);
            // Set other metadata as appropriate.
            PutObjectRequest req = new PutObjectRequest(bucket, key, limitStream, md);
            s3Client.putObject(req);
        } // plus exception handling
    }
}

LimitInputStream was what my coworker suggested, apparently not aware that it had been deprecated. LimitInputStream 是我同事建议的,显然不知道它已被弃用。 ByteStreams.limit is the current Guava replacement, and it does what I want. ByteStreams.limit 是当前的番石榴替代品,它可以满足我的需求。 Thanks, everyone.谢谢大家。

Complete answer rip & replace:完整的答案翻录和替换:

It is relatively straightforward to wrap an InputStream such as to cap the number of bytes it will deliver before signaling end-of-data.包装InputStream相对简单,例如在发送数据结束之前限制它将传递的字节数。 FilterInputStream is targeted at this general kind of job, but since you have to override pretty much every method for this particular job, it just gets in the way. FilterInputStream的目标是这种一般性的工作,但是由于您必须覆盖该特定工作的几乎所有方法,因此它只会妨碍您。

Here's a rough cut at a solution:这是一个粗略的解决方案:

import java.io.IOException;
import java.io.InputStream;

/**
 * An {@code InputStream} wrapper that provides up to a maximum number of
 * bytes from the underlying stream.  Does not support mark/reset, even
 * when the wrapped stream does, and does not perform any buffering.
 */
public class BoundedInputStream extends InputStream {

    /** This stream's underlying @{code InputStream} */
    private final InputStream data;

    /** The maximum number of bytes still available from this stream */ 
    private long bytesRemaining;

    /**
     * Initializes a new {@code BoundedInputStream} with the specified
     * underlying stream and byte limit
     * @param data the @{code InputStream} serving as the source of this
     *        one's data
     * @param maxBytes the maximum number of bytes this stream will deliver
     *        before signaling end-of-data
     */
    public BoundedInputStream(InputStream data, long maxBytes) {
        this.data = data;
        bytesRemaining = Math.max(maxBytes, 0);
    }

    @Override
    public int available() throws IOException {
        return (int) Math.min(data.available(), bytesRemaining);
    }

    @Override
    public void close() throws IOException {
        data.close();
    }

    @Override
    public synchronized void mark(int limit) {
        // does nothing
    }

    @Override
    public boolean markSupported() {
        return false;
    }

    @Override
    public int read(byte[] buf, int off, int len) throws IOException {
        if (bytesRemaining > 0) {
            int nRead = data.read(
                    buf, off, (int) Math.min(len, bytesRemaining));

            bytesRemaining -= nRead;

            return nRead;
        } else {
            return -1;
        }
    }

    @Override
    public int read(byte[] buf) throws IOException {
        return this.read(buf, 0, buf.length);
    }

    @Override
    public synchronized void reset() throws IOException {
        throw new IOException("reset() not supported");
    }

    @Override
    public long skip(long n) throws IOException {
        long skipped = data.skip(Math.min(n, bytesRemaining));

        bytesRemaining -= skipped;

        return skipped;
    }

    @Override
    public int read() throws IOException {
        if (bytesRemaining > 0) {
            int c = data.read();

            if (c >= 0) {
                bytesRemaining -= 1;
            }

            return c;
        } else {
            return -1;
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM