使用csv.DictWriter输出内存gzip压缩的csv文件？

Question

I want to use a DictWriter from Python's csv module to generate a .csv file that's compressed using GZip. 我想使用Python的csv模块中的DictWriter生成一个使用GZip压缩的.csv文件。 I need to do this all in-memory, so utilizing local files is out of the question. 我需要在内存中完成所有操作，因此无法使用本地文件。

However, I'm having trouble dealing with each module's type requirements in Python 3. Assuming that I got the general structure correctly, I can't make both modules work together because DictWriter needs to write to a io.StringIO buffer, while GZip needs a io.BytesIO object. 但是，我在处理Python 3中每个模块的类型要求时遇到了麻烦。假设我正确地获得了通用结构，则由于DictWriter需要写入io.StringIO缓冲区，而GZip需要，因此我无法使两个模块一起工作。 io.BytesIO对象。

So, when I try to do: 因此，当我尝试执行以下操作时：

buffer = io.BytesIO()
compressed = gzip.GzipFile(fileobj=buffer, mode='wb')
dict_writer = csv.DictWriter(buffer, ["a", "b"], extrasaction="ignore")

I get: 我得到：

TypeError: a bytes-like object is required, not 'str'

And trying to use io.StringIO with GZip doesn't work either. 尝试将io.StringIO与GZip一起使用也不起作用。 How can I go about this? 我该怎么办？

Answer 1

You can use io.TextIOWrapper to seamlessly transform a text stream into a binary one: 您可以使用io.TextIOWrapper将文本流无缝转换为二进制流：

import io
import gzip
import csv
buffer = io.BytesIO()
with gzip.GzipFile(fileobj=buffer, mode='wb') as compressed:
    with io.TextIOWrapper(compressed, encoding='utf-8') as wrapper:
        dict_writer = csv.DictWriter(wrapper, ["a", "b"], extrasaction="ignore")
        dict_writer.writeheader()
        dict_writer.writerows([{'a': 1, 'b': 2}, {'a': 4, 'b': 3}])
print(buffer.getvalue()) # dump the compressed binary data
buffer.seek(0)
dict_reader = csv.DictReader(io.TextIOWrapper(gzip.GzipFile(fileobj=buffer, mode='rb'), encoding='utf-8'))
print(list(dict_reader)) # see if uncompressing the compressed data gets us back what we wrote

This outputs: 输出：

b'\x1f\x8b\x08\x00\x9c6[\\\x02\xffJ\xd4I\xe2\xe5\xe52\xd41\x02\x92&:\xc6@\x12\x00\x00\x00\xff\xff\x03\x00\x85k\xa2\x9e\x12\x00\x00\x00'
[OrderedDict([('a', '1'), ('b', '2')]), OrderedDict([('a', '4'), ('b', '3')])]

Answer 2

A roundabout way would be to write it to a io.StringIO object first and then convert the content back to io.BytesIO : 一种回旋方式是io.StringIO其写入io.StringIO对象，然后将内容转换回io.BytesIO ：

s = io.StringIO()
b = io.BytesIO()

dict_writer = csv.DictWriter(s, ["a", "b"], extrasaction="ignore")

... # complete your write operations ...

s.seek(0)  # reset cursor to the beginning of the StringIO stream
b.write(s.read().encode('utf-8')) # or an encoding of your choice

compressed = gzip.GzipFile(fileobj=b, mode='wb')

... 

s.close()   # Remember to close your streams!
b.close()

Though as @wwii's comment suggest, depending on the size of your data, perhaps it's more worthwhile to write your own csv in bytes instead. 尽管正如@wwii的注释所建议的那样，根据数据的大小，也许更值得用bytes写自己的csv 。

使用csv.DictWriter输出内存gzip压缩的csv文件？

问题描述

2 个解决方案

解决方案1
3 2019-02-06 18:35:19

解决方案2
1 已采纳 2019-02-06 18:14:10

使用csv.DictWriter输出内存gzip压缩的csv文件？

问题描述

2 个解决方案

解决方案1 3 2019-02-06 18:35:19

解决方案2 1 已采纳 2019-02-06 18:14:10

解决方案1
3 2019-02-06 18:35:19

解决方案2
1 已采纳 2019-02-06 18:14:10