FileInputStream和DataOutputStream - 处理byte []缓冲区

Question

I've been working on an app to move files between two hosts and while I got the transfer process to work (code is still really messy so sorry for that, I'm still fixing it) I'm kinda left wondering how exactly it handles the buffer. 我一直在研究一个应用程序，在两个主机之间移动文件，同时我让传输过程工作（代码仍然非常混乱，很抱歉，我还在修复它）我有点想知道它到底是怎么回事处理缓冲区。 I'm fairly new to networking in java so I just don't want to end up with "meh i got it to work so let's move on" attitude. 我对java中的网络相当陌生，所以我只是不想最终得到“嗯，我让它工作，所以让我们继续前进”的态度。

File sending code. 文件发送代码。

    public void sendFile(String filepath, DataOutputStream dos) throws Exception{
    if (new File(filepath).isFile()&&dos!=null){
        long size = new File(filepath).length();
        String strsize = Long.toString(size) +"\n";
        //System.out.println("File size in bytes: " + strsize);
        outToClient.writeBytes(strsize);
        FileInputStream fis = new FileInputStream(filepath);
        byte[] filebuffer = new byte[8192];

        while(fis.read(filebuffer) > 0){
            dos.write(filebuffer);
            dos.flush();
        }

File recieving code 文件接收代码

   public void saveFile() throws Exception{
    String size = inFromServer.readLine();
    long longsize = Long.parseLong(size);
    //System.out.println(longsize);
    String tmppath = currentpath + "\\" + tmpdownloadname;
    DataInputStream dis = new DataInputStream(clientSocket.getInputStream());
    FileOutputStream fos = new FileOutputStream(tmppath);
    byte[] filebuffer = new byte[8192];
    int read = 0;
    int remaining = (int)longsize;
    while((read = dis.read(filebuffer, 0, Math.min(filebuffer.length, remaining))) > 0){
        //System.out.println(Math.min(filebuffer.length, remaining));
        //System.out.println(read);
        //System.out.println(remaining);
        remaining -= read;
        fos.write(filebuffer,0, read);
    }

}

I'd like to know how exactly buffers on both sides are handled to avoid writing wrong bytes. 我想知道如何处理双方的缓冲区以避免写错字节。 (ik how receiving code avoids that but i'd still like to know how byte array is handled) （ik如何接收代码避免这种情况，但我仍然想知道如何处理字节数组）

Does fis/dis always wait for buffers to fill up fully? fis / dis总是等待缓冲区完全填满吗？ In receiving code it always writes full array or remaining length if it's less than filebuffer.length but what about fis from sending code. 在接收代码时，如果它小于filebuffer.length，它总是写入完整数组或剩余长度但是发送代码时fis怎么样。

Answer 1

In fact, your code could have a subtle bug, exactly because of the way you handle buffers. 事实上，您的代码可能有一个微妙的错误，完全是因为您处理缓冲区的方式。

When you read a buffer from the original file, the read(byte[]) method returns the number of bytes actually read. 从原始文件读取缓冲区时， read(byte[])方法返回实际读取的字节数。 There is no guarantee that, in fact, all 8192 bytes have been read. 实际上，无法保证已读取所有8192个字节。

Suppose you have a file with 10000 bytes. 假设您有一个10000字节的文件。 Your first read operation reads 8192 bytes. 您的第一个读操作读取8192个字节。 Your second read operation, however, will only read 1808 bytes. 但是，第二次读操作只能读取1808个字节。 The third operation will return -1. 第三个操作将返回-1。

In the first read, you write exactly the bytes that you have read, because you read a full buffer. 在第一次读取时，您准确写入已读取的字节，因为您读取了一个完整的缓冲区。 But in the second read, your buffer actually contains 1808 correct bytes, and the remaining 6384 bytes are wrong - they are still there from the previous read. 但在第二次读取时，您的缓冲区实际上包含1808个正确的字节，剩余的6384个字节是错误的 - 它们仍然存在于上一次读取中。

In this case you are lucky, because this only happens in the last buffer that you write. 在这种情况下，您很幸运，因为这只发生在您编写的最后一个缓冲区中。 Thus, the fact that you stop reading on your client side when you reach the pre-sent length causes you to skip those 6384 wrong bytes which you shouldn't have sent anyway. 因此，当您达到预先发送的长度时，您在客户端停止读取这一事实会导致您跳过那些您不应该发送的错误字节。

But in fact, there is no actual guarantee that reading from the file will return 8192 bytes even if the end was not reached yet. 但事实上，即使还没有达到目的，也没有实际保证从文件读取将返回8192个字节。 The method's contract does not guarantee that, and it's up to the OS and underlying file system. 该方法的合同不保证，它取决于操作系统和底层文件系统。 It could, for example, send you 5000 bytes in your first read, and 5000 in your second read. 例如，它可以在您第一次读取时向您发送5000个字节，在第二次读取时向您发送5000个字节。 In this case, you would be sending 3192 wrong bytes in the middle of the file. 在这种情况下，您将在文件中间发送3192错误的字节。

Therefore, your code should actually look like: 因此，您的代码应该看起来像：

byte[] filebuffer = new byte[8192];
int read = 0;
while(( read = fis.read(filebuffer)) > 0){
    dos.write(filebuffer,0,read);
    dos.flush();
}

much like the code you have on the receiving side. 很像你在接收方的代码。 This guarantees that only the actual bytes read will be written. 这可以保证只写入读取的实际字节数。

So there is nothing actually magical about the way buffers are handled. 因此，缓冲区的处理方式实际上并不神奇。 You give the stream a buffer, you tell it how much of the buffer it's allowed to fill, but there is no guarantee it will fill all of it. 你给流一个缓冲区，你告诉它允许填充多少缓冲区，但不能保证它会填满所有缓冲区。 It may fill less and you have to take care and use only the portion it tells you it fills. 它可能填充较少，你必须小心，只使用它告诉你它填充的部分。

Another grave mistake you are making, though, is to just convert the long that you received into an int in this line: 但是，您正在犯的另一个严重错误是将您收到的long转换为此行中的int ：

int remaining = (int)longsize;

Files may be longer than an integer contains. 文件可能比整数包含的长。 Especially things like long videos etc. This is why you get that number as a long in the first place. 尤其是像东西长视频等，这就是为什么你得到这个数字的long摆在首位。 Don't truncate it like that. 不要那样截断它。 Keep the remaining as long and change it to int only after you have taken the minimum (because you know the minimum will always be in the range of an int ). 保持remaining long并且只有在取得最小值后才将其更改为int （因为您知道最小值将始终在int的范围内）。

long remaining = longsize;
long fileBufferLen = filebuffer.length;

while((read = dis.read(filebuffer, 0, (int)Math.min(fileBufferLen, remaining))) > 0){
    ...
}

By the way, there is no real reason to use a DataOutputStream and DataInputStream for this. 顺便说一下，没有真正的理由为此使用DataOutputStream和DataInputStream 。 The read(byte[]) , read(byte[],int,int) , write(byte[]) , and write(byte[],int,int) are inherited from the underlying InputStream and there is no reason not to use the socket's OutputStream / InputStream directly, or use a BufferedOutputStream / BufferedOutputStream to wrap it. read(byte[]) ， read(byte[],int,int) ， write(byte[])和write(byte[],int,int)继承自底层InputStream ，没有理由不直接使用套接字的OutputStream / InputStream ，或使用BufferedOutputStream / BufferedOutputStream来包装它。 There is also no need to use flush until you have finished writing/reading. 在完成写入/阅读之前，也无需使用flush 。

Also, do not forget to close at least your file input/output streams when you are done with them. 此外，完成后，不要忘记至少关闭文件输入/输出流。 You may want to keep the socket input/output streams open for continued communication, but there is no need to keep the files themselves open, it may cause problems. 您可能希望打开套接字输入/输出流以继续通信，但不需要保持文件本身打开，这可能会导致问题。 Use a try-with-resources to guarantee that they are closed. 使用try-with-resources确保它们已关闭。

FileInputStream和DataOutputStream - 处理byte []缓冲区

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-01-06 11:56:28

FileInputStream和DataOutputStream - 处理byte []缓冲区

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-01-06 11:56:28

解决方案1
1 已采纳 2019-01-06 11:56:28