简体   繁体   English

InputStream是如何在内存中管理的?

[英]How is InputStream managed in memory?

I am familiar with the concept of InputStream ,buffers and why they are useful (when you need to work with data that might be larger then the machines RAM for example) . 我熟悉InputStream的概念,缓冲区以及它们有用的原因(例如,当您需要处理可能比机器RAM大的数据时)

I was wondering though, how does the InputStream actually carry all that data?. 我想知道, InputStream如何实际携带所有数据? Could a OutOfMemoryError be caused if there is TOO much data being transfered? 如果转移了太多数据,是否会导致OutOfMemoryError

Case-scenario 案例场景

If I connect from a client to a server,requesting a 100GB file, the server starts iterating through the bytes of the file with a buffer, and writing the bytes back to the client with outputStream.write(byte[]) . 如果我从客户端连接到服务器,请求100GB文件,服务器开始使用缓冲区迭代文件的字节,并使用outputStream.write(byte[])将字节写回客户端。 The client is not ready to read the InputStream right now,for whatever reason. 无论出于何种原因,客户端现在还没有准备好读取InputStream Will the server continue sending the bytes of the file indefinitely? 服务器会继续无限期地发送文件的字节吗? and if so, won't the outputstream/inputstream be larger than the RAM of one of these machines? 如果是这样, outputstream/inputstream不会大于其中一台机器的RAM吗?

InputStream and OutputStream implementations do not generally use a lot of memory. InputStreamOutputStream实现通常不会占用大量内存。 In fact, the word "Stream" in these types means that it does not need to hold the data, because it is accessed in a sequential manner -- in the same way that a stream can transfer water between a lake and the ocean without holding a lot of water itself. 实际上,这些类型中的“流”一词意味着它不需要保存数据,因为它是以顺序方式访问的 - 就像流可以在湖泊和海洋之间传输水而不保持相同的方式很多水本身。

But "stream" is not the best word to describe this. 但“流”并不是描述这一点的最佳词汇。 It's more like a pipe, because when you transfer data from a server to a client, every stage transfers back-pressure from the client that controls the rate at which data gets sent. 它更像是一个管道,因为当您将数据从服务器传输到客户端时,每个阶段都会从客户端传输反压力 ,从而控制数据发送的速率。 This is similar to how your faucet controls the rate of flow through your pipes all the way to the city reservoir: 这类似于您的水龙头如何控制通过管道一直到城市水库的流量:

  1. As the client reads data, it's InputStream only requests more data from the OS when its internal (small) buffers are empty. 当客户端读取数据时,它的InputStream只在内部(小)缓冲区为空时才从OS请求更多数据。 Each request allows only a limited amount of data to be transferred; 每个请求只允许传输有限数量的数据;
  2. As data is requested from the OS, its own internal buffer empties, and it notifies the server about how much space there is for new data. 当从OS请求数据时,其自己的内部缓冲区清空,并通知服务器有关新数据的空间大小。 The server can send only this much (that's called 'flow control' in TCP: https://en.wikipedia.org/wiki/Transmission_Control_Protocol#Resource_usage ) 服务器只能发送这么多(在TCP中称为“流控制”: https//en.wikipedia.org/wiki/Transmission_Control_Protocol#Resource_usage
  3. On the server side, the server-side OS sends out data from its own internal buffer when the client has space to receive it. 在服务器端,当客户端有空间接收数据时,服务器端OS会从其自己的内部缓冲区发送数据。 As its own internal buffer empties, it allows the writing process to re-fill it with more data. 当它自己的内部缓冲区清空时,它允许写入过程用更多数据重新填充它。
  4. As the server-side process write()s to its OutputStream , the OutputStream will try to write data to the OS. 作为服务器端进程write()s到其OutputStreamOutputStream将尝试将数据写入操作系统。 When the OS buffer is full, it will make the server process wait until the server-side buffer has space to accept new data. 当OS缓冲区已满时,它将使服务器进程等待,直到服务器端缓冲区有空间接受新数据。

Notice that a slow client can make the server process take a very long time. 请注意,慢速客户端可能会使服务器进程花费很长时间。 If you're writing a server, and you don't control the clients, then it's very important to consider this and to ensure that there are not a lot of server-side resources tied up while a long data transfer takes place. 如果您正在编写服务器,并且您无法控制客户端,那么考虑这一点非常重要,并确保在进行长时间数据传输时不会占用大量服务器端资源。

Your question is as interesting as difficult to answer properly. 你的问题很难回答,很难回答。

  • First: InputStream and OutputStream are not a storage means, but an access means: They describe that the data shall be accessed in sequential, unidirectional order , but not how it shall be stored. 第一: InputStreamOutputStream不是存储装置,但是访问意味着:它们描述了数据应该按顺序,单向顺序访问,而不是如何存储。 The actual way of storing the data is implementation-dependent . 存储数据的实际方式取决于实现

So, would there be an InputStream that stores the whole amount of data simultaneally in memory? 那么,是否会有一个InputStream同时在内存中存储全部数据? Yes, could be, though it would be an appalling implementation. 是的,可能是,虽然这将是一个令人震惊的实施。 The most common and sensitive implementation of InputStreams / OutputStreams is by storing just a fixed and short amount of data into a temporary buffer of 4K-8K, for example. 例如,InputStreams / OutputStreams最常见和最敏感的实现是将固定和短时间的数据存储到4K-8K的临时缓冲区中。

(So far, I supposed you already knew that, but it was necessary to tell.) (到目前为止,我认为你已经知道了,但有必要告诉。)

  • Second: What about connected writting / reading streams between a server and a client? 第二:服务器和客户端之间连接的写/读流怎么样? In a common scenario of buffered writting, the server will not write more data than the buffer allows. 在缓冲写入的常见场景中,服务器不会写入比缓冲区允许的更多数据。 So, if the server starts writing, and the client then goes down (for whatever reason), the server will just keep writing until the buffer is full, and then set it as ready for reading, and until the read is not completed (by the client peer), the server won't fill the buffer again. 因此,如果服务器开始写入,然后客户端关闭(无论出于何种原因),服务器将继续写入,直到缓冲区已满,然后将其设置为准备好读取,直到读取未完成(通过客户端对等),服务器不会再次填充缓冲区。 Remember: This kind of read/write is blocking : The client blocks until there is a buffer ready to be read, and the server blocks (or, at least, the server thread bound to this connection, it's understood) until the last read is completed. 记住:这种读/写是阻塞的 :客户端阻塞,直到有一个准备好要读取的缓冲区,并且服务器阻塞(或者,至少是绑定到此连接的服务器线程,这是理解的),直到最后一次读取为止完成。

How many time will the server block? 服务器会阻塞多少时间? Typically, a server should have a security timeout to ensure that long blocks will break the connection, thus releasing the blocked thread. 通常,服务器应具有安全超时,以确保长块将断开连接,从而释放阻塞的线程。 The same should have the client. 同样应该有客户。

The timeouts set for the connection depend on the implementation, and the protocol. 为连接设置的超时取决于实现和协议。

No, it does not need to hold all data. 不,它不需要保存所有数据。 I just advances forward in the file (usually using buffered data). 我只是在文件中前进(通常使用缓冲数据)。 The stream can discard old buffers as it pleases. 流可以随意丢弃旧缓冲区。

Note that there are aa lot of very different implementations of inputstreams, so the exact behaviour varies a lot. 请注意,输入流有很多非常不同的实现,因此确切的行为会有很大差异。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM