简体   繁体   English

Python 3:gzip.open()和模式

[英]Python 3: gzip.open() and modes

https://docs.python.org/3/library/gzip.html https://docs.python.org/3/library/gzip.html

I am considering to use gzip.open() , and I am a little confused about the mode argument: 我正在考虑使用gzip.open() ,我对mode参数有点困惑:

The mode argument can be any of 'r', 'rb', 'a', 'ab', 'w', 'wb', 'x' or 'xb' for binary mode, or 'rt', 'at', 'wt', or 'xt' for text mode. 模式参数可以是二进制模式的'r','rb','a','ab','w','wb','x'或'xb'中的任何一个,或'rt','at' ,'wt'或'xt'用于文本模式。 The default is 'rb'. 默认值为'rb'。

So what is the difference between 'w' and 'wb' ? 那么'w''wb'什么区别?

The document states they are both binary mode . 该文件表明它们都是二进制模式

So does that mean that there is no difference between 'w' and 'wb' ? 那么这是否意味着'w''wb'之间没有区别?

It means that r defaults to rb , and if you want text you have to specify it using rt . 这意味着r默认为rb ,如果需要文本,则必须使用rt指定它。

(as opposed to open behaviour where r means rt , not rb ) (与open行为相反,其中r表示rt ,而不是rb

Exactly as you say and as already covered by @ 正如你所说和@已经涵盖的那样

Jean-François Fabre answer. Jean-FrançoisFabre的回答。
I just wanted to show some code, as it was fun. 我只是想展示一些代码,因为它很有趣。
Let's have a look at the gzip.py source code in the python library to see that's effectively what happens. 让我们看一下python库中的gzip.py源代码,看看有效的情况。
The gzip.open() can be found here https://github.com/python/cpython/blob/master/Lib/gzip.py and I report below gzip.open()可以在这里找到https://github.com/python/cpython/blob/master/Lib/gzip.py ,我在下面报告

def open(filename, mode="rb", compresslevel=9,
         encoding=None, errors=None, newline=None):
    """Open a gzip-compressed file in binary or text mode.
    The filename argument can be an actual filename (a str or bytes object), or
    an existing file object to read from or write to.
    The mode argument can be "r", "rb", "w", "wb", "x", "xb", "a" or "ab" for
    binary mode, or "rt", "wt", "xt" or "at" for text mode. The default mode is
    "rb", and the default compresslevel is 9.
    For binary mode, this function is equivalent to the GzipFile constructor:
    GzipFile(filename, mode, compresslevel). In this case, the encoding, errors
    and newline arguments must not be provided.
    For text mode, a GzipFile object is created, and wrapped in an
    io.TextIOWrapper instance with the specified encoding, error handling
    behavior, and line ending(s).
    """
    if "t" in mode:
        if "b" in mode:
            raise ValueError("Invalid mode: %r" % (mode,))
    else:
        if encoding is not None:
            raise ValueError("Argument 'encoding' not supported in binary mode")
        if errors is not None:
            raise ValueError("Argument 'errors' not supported in binary mode")
        if newline is not None:
            raise ValueError("Argument 'newline' not supported in binary mode")

    gz_mode = mode.replace("t", "")
    if isinstance(filename, (str, bytes, os.PathLike)):
        binary_file = GzipFile(filename, gz_mode, compresslevel)
    elif hasattr(filename, "read") or hasattr(filename, "write"):
        binary_file = GzipFile(None, gz_mode, compresslevel, filename)
    else:
        raise TypeError("filename must be a str or bytes object, or a file")

    if "t" in mode:
        return io.TextIOWrapper(binary_file, encoding, errors, newline)
    else:
        return binary_file  

Few things we notice: 我们注意到的一些事情:

  • the default mode is rb as the documentation you report says 默认模式为rb如您报告的文档所示
  • to open a binary file, it doesn't care whether it's "r", "rb", "w", "wb" for example. 要打开一个二进制文件,它并不关心它是"r", "rb", "w", "wb"等。
    This we can see in the following lines: 我们可以在以下几行中看到:

     gz_mode = mode.replace("t", "") if isinstance(filename, (str, bytes, os.PathLike)): binary_file = GzipFile(filename, gz_mode, compresslevel) elif hasattr(filename, "read") or hasattr(filename, "write"): binary_file = GzipFile(None, gz_mode, compresslevel, filename) else: raise TypeError("filename must be a str or bytes object, or a file") if "t" in mode: return io.TextIOWrapper(binary_file, encoding, errors, newline) else: return binary_file 

    basically the binary file binary_file gets built wether there's an additional b or not as gz_mode can have the b or not at this point. 基本上二进制文件binary_file是建立的,还有一个额外的b或者没有,因为gz_mode可以有b或者没有。
    Now the class class GzipFile(_compression.BaseStream) is called to build binary_file . 现在调用类class GzipFile(_compression.BaseStream)来构建binary_file

In the constructor the following lines are important: 在构造函数中,以下行很重要:

 if mode and ('t' in mode or 'U' in mode):
        raise ValueError("Invalid mode: {!r}".format(mode))
    if mode and 'b' not in mode:
        mode += 'b'
    if fileobj is None:
        fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
    if filename is None:
        filename = getattr(fileobj, 'name', '')
        if not isinstance(filename, (str, bytes)):
            filename = ''
    else:
        filename = os.fspath(filename)
    if mode is None:
        mode = getattr(fileobj, 'mode', 'rb')

where can be clearly seen that if 'b' is not present in the mode it will be added 在哪里可以清楚地看到,如果'b'在模式中不存在,它将被添加

if mode and 'b' not in mode:
            mode += 'b'  

so there's no distinction between the two modes as already discussed. 因此,已经讨论过的两种模式之间没有区别。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM