简体   繁体   中英

Python - How to use custom buffer_size in io.BufferedReader?

From what I understand, the buffer_size argument to io.BufferedReader is supposed to control the read buffer size passed to the underlying reader.

However, I'm not seeing that behavior. Instead, when I reader.read() the entire file, io.DEFAULT_BUFFER_SIZE is used and buffer_size is ignored. When I reader.read(length) , length is used as buffer size, and the buffer_size argument is again ignored.

Minimal example:

import io

class MyReader(io.RawIOBase):

    def __init__(self, length):
        self.length = length
        self.position = 0

    def readinto(self, b):
        print('read buffer length: %d' % len(b))
        length = min(len(b), self.length - self.position)
        self.position += length
        b[:length] = 'a' * length
        return length

    def readable(self):
        return True

    def seekable(self):
        return False


print('# read entire file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print('output length: %d' % len(reader.read()))

print('\n# read part of file file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print('output length: %d' % len(reader.read(10000)))

print('\n# read beyond end of file file')
reader = io.BufferedReader(MyReader(20000), buffer_size=100)
print 'output length: %d' % len(reader.read(30000))

Outputs:

# read entire file
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192
output length: 20000

# read part of file file
read buffer length: 10000
output length: 10000

# read beyond end of file file
read buffer length: 30000
read buffer length: 10000
output length: 20000

Am I misunderstanding how the BufferedReader is supposed to work?

The point of BufferedIOReader is to keep an internal buffer, and you set the size of that buffer. That buffer is used to satisfy smaller reads, to avoid many read calls on a slower I/O device.

The buffer does not try to limit the size of reads, however!

From the io.BufferedIOReader documentation :

When reading data from this object, a larger amount of data may be requested from the underlying raw stream, and kept in an internal buffer. The buffered data can then be returned directly on subsequent reads.

The object inherits from io.BufferedIOBase , which states:

The main difference with RawIOBase is that methods read() , readinto() and write() will try (respectively) to read as much input as requested or to consume all given output, at the expense of making perhaps more than one system call.

Because you called .read() on the object, larger blocks are read from the wrapped object to read all data to the end. The internal buffer that the BufferedIOReader() instance holds doesn't come into play here, you asked for all the data after all.

The buffer would come into play if you read in smaller blocks:

>>> reader = io.BufferedReader(MyReader(2048), buffer_size=512)
>>> __ = reader.read(42)  # initial read, fill buffer
read buffer length: 512
>>> __ = reader.read(123)  # within the buffer, no read to underlying file needed
>>> __ = reader.read(456)  # deplete buffer, another read needed to re-fill
read buffer length: 512
>>> __ = reader.read(123)  # within the buffer, no read to underlying file needed
>>> __ = reader.read()     # read until end, uses larger blocks to read from wrapped file
read buffer length: 8192
read buffer length: 8192
read buffer length: 8192

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM