python中的mmap打印二进制数据而不是文本

Question

I am trying to read a big file of 30 MB character by character. 我正在尝试逐字符读取30 MB的大文件。 I found an interesting article on how to read a big file. 我找到了一篇有关如何读取大文件的有趣文章。 Fast Method to Stream Big files 快速传输大文件的方法

Problem: Output printing binary data instead of actual human readable text 问题：输出打印二进制数据，而不是实际的人类可读文本

Code: 码：

def getRow(filepath):
   offsets = get_offsets(filepath)
   random.shuffle(offsets)
   with gzip.open(filepath, "r+b") as f:
      i = 0
      mm = mmap.mmap(f.fileno(), 0, access = mmap.ACCESS_READ)
      for position in offsets:
          mm.seek(position)
          record = mm.readline()
          x = record.split(",")
          yield x


def get_offsets(input_filename):
   offsets = []
   with open(input_filename, 'r+b') as f:
       i = 0
       mm = mmap.mmap(f.fileno(), 0, access = mmap.ACCESS_READ)
       for record in iter(mm.readline, ''):
           loc = mm.tell()
           offsets.append(loc)
           i += 1
   return offsets

for line in getRow("hello.dat.gz"):
    print line

Output: The output is producing some weird binary data. 输出：输出产生一些奇怪的二进制数据。

['w\xc1\xd9S\xabP8xy\x8f\xd8\xae\xe3\xd8b&\xb6"\xbeZ\xf3P\xdc\x19&H\\@\x8e\x83\x0b\x81?R\xb0\xf2\xb5\xc1\x88rJ\

Am I doing something terribly stupid? 我在做一些非常愚蠢的事情吗？

EDIT: 编辑：

I found the problem. 我发现了问题。 It is because of gzip.open . 这是因为gzip.open 。 Not sure how to get rid of this. 不知道如何摆脱这一点。 Any ideas? 有任何想法吗？

Answer 1

As per the documentation of GZipFile : 根据GZipFile的文档：

fileno(self)

 Invoke the underlying file object's `fileno()` method.

You are mapping a view of the compressed .gz file, not a view of the compressed data. 您正在映射压缩的.gz文件的视图，而不是压缩数据的视图。

mmap() can only operate on OS file handles, it cannot map arbitrary Python file objects. mmap()只能在OS文件句柄上运行，它不能映射任意Python文件对象。

So no, you cannot transparently map a decompressed view of a compressed file unless this is supported directly by the underlying operating system. 因此，您不能透明地映射压缩文件的解压缩视图，除非基础操作系统直接支持此操作。

python中的mmap打印二进制数据而不是文本

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-06-19 05:51:02

python中的mmap打印二进制数据而不是文本

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-06-19 05:51:02

解决方案1
1 已采纳 2017-06-19 05:51:02