I'm attempting to write a Python 2/3 compatible routine to fetch a CSV file, decode it from latin_1
into Unicode and feed it to a csv.DictReader
in a robust, scalable manner.
python-future
including imporing open
from builtins
, and importing unicode_literals
for consistent behaviour tempfile.SpooledTemporaryFile
io.TextIOWrapper
to handle decoding from the latin_1
encoding before feeding to DictReader
This all works fine under Python 3.
The problem is that TextIOWrapper
expects to wrap a stream which conforms to BufferedIOBase
. Unfortunately under Python 2, although I have imported the Python 3-style open
, the vanilla Python 2 tempfile.SpooledTemporaryFile
still of course returns a Python 2 cStringIO.StringO
, instead of a Python 3 io.BytesIO
as required by TextIOWrapper
.
I can think of these possible approaches:
cStringIO.StringO
as a Python 3-style io.BytesIO
. I'm not sure how to approach this - would I need to write such a wrapper or does one already exist? cStringIO.StringO
stream for decoding. I haven't found one yet. SpooledTemporaryFile
, decode entirely in memory. How big would the CSV file need to be for operating entirely in memory to become a concern? SpooledTemporaryFile
, and implement my own spill-to-disk. This would allow me to call open
from python-future, but I'd rather not as it would be very tedious and probably less secure. What's the best way forward? Have I missed anything?
Imports:
from __future__ import (absolute_import, division,
print_function, unicode_literals)
from builtins import (ascii, bytes, chr, dict, filter, hex, input, # noqa
int, map, next, oct, open, pow, range, round, # noqa
str, super, zip) # noqa
import csv
import tempfile
from io import TextIOWrapper
import requests
Init:
...
self._session = requests.Session()
...
Routine:
def _fetch_csv(self, path):
raw_file = tempfile.SpooledTemporaryFile(
max_size=self._config.get('spool_size')
)
csv_r = self._session.get(self.url + path)
for chunk in csv_r.iter_content():
raw_file.write(chunk)
raw_file.seek(0)
text_file = TextIOWrapper(raw_file._file, encoding='latin_1')
return csv.DictReader(text_file)
Error:
...in _fetch_csv
text_file = TextIOWrapper(raw_file._file, encoding='utf-8')
AttributeError: 'cStringIO.StringO' object has no attribute 'readable'
Not sure whether this will be useful. The situation is only vaguely analogous to yours.
I wanted to use NamedTemporaryFile to create a CSV to be encoded in UTF-8 and have OS native line endings, possibly not-quite- standard , but easily accommodated by using the Python 3 style io.open.
The difficulty is that NamedTemporaryFile in Python 2 opens a byte stream, causing problems with line endings . The solution I settled on, which I think is a bit nicer than separate cases for Python 2 and 3, is to create the temp file then close it and reopen with io.open. The final piece is the excellent backports.csv library which provides the Python 3 style CSV handling in Python 2.
from __future__ import absolute_import, division, print_function, unicode_literals
from builtins import str
import csv, tempfile, io, os
from backports import csv
data = [["1", "1", "John Coltrane", 1926],
["2", "1", "Miles Davis", 1926],
["3", "1", "Bill Evans", 1929],
["4", "1", "Paul Chambers", 1935],
["5", "1", "Scott LaFaro", 1936],
["6", "1", "Sonny Rollins", 1930],
["7", "1", "Kenny Burrel", 1931]]
## create CSV file
with tempfile.NamedTemporaryFile(delete=False) as temp:
filename = temp.name
with io.open(filename, mode='w', encoding="utf-8", newline='') as temp:
writer = csv.writer(temp, quoting=csv.QUOTE_NONNUMERIC, lineterminator=str(os.linesep))
headers = ['X', 'Y', 'Name', 'Born']
writer.writerow(headers)
for row in data:
print(row)
writer.writerow(row)
@cbare's approach should probably be avoided. It indeed works but here is what happens with it:
tempfile.NamedTemporaryFile()
to create temporary file. We then remember its name. with
statement and that file is closed. io.open()
. At first glance it looks okay, and at second glance too. But I am not sure if on some platforms (like nt
) it might be possible to remove the other user's file when it is not opened - and then create it again but have access to its contents. Please somebody correct me if this is not possible.
Here is what I would suggest instead:
# Create temporary file
with tempfile.NamedTemporaryFile() as tf_oldstyle:
# get its file descriptor - note that it will also work with tempfile.TemporaryFile
# which has no meaningful name at all
fd = tf_oldstyle.fileno()
# open that fd with io.open, using desired mode (could use binary mode or whatever)
tf = io.open(fd, 'w+', encoding='utf-8', newline='')
# note we don't use a with statement here, because this fd will be closed once we leave the outer with block
# now work with the tf
writer = csv.writer(tf, ...)
writer.writerow(...)
# At this point, fd is closed, and the file is deleted.
Or we could directly use tempfile.mkstemp()
which will create file and return its name and fd as a tuple - although using *TemporaryFile
is probably more secure & portable between platforms.
fd, name = tempfile.mkstemp()
try:
tf = io.open(fd, 'w+', encoding='utf-8', newline='')
writer = csv.writer(tf, ...)
writer.writerow(...)
finally:
os.close(fd)
os.unlink(name)
I would try subclassing SpooledTemporaryFile
under python2 and overriding its rollover
method.
Warning: this is not tested.
import io
import sys
import tempfile
if sys.version_info >= (3,):
SpooledTemporaryFile = tempfile.SpooledTemporaryFile
else:
class SpooledTemporaryFile(tempfile.SpooledTemporaryFile):
def __init__(self, max_size=0, mode='w+b', **kwargs):
# replace cStringIO with io.BytesIO or io.StringIO
super(SpooledTemporaryFile, self).__init__(max_size, mode, **kwargs)
if 'b' in mode:
self._file = io.BytesIO()
else:
self._file = io.StringIO(newline='\n') # see python3's tempfile sources for reason
def rollover(self):
if self._rolled:
return
# call super's implementation and then replace underlying file object
super(SpooledTemporaryFile, self).rollover()
fd = self._file.fileno()
name = self._file.name
mode = self._file.mode
delete = self._file.delete
pos = self._file.tell()
# self._file is a tempfile._TemporaryFileWrapper.
# It caches methods so we cannot just replace its .file attribute,
# so let's create another _TemporaryFileWrapper
file = io.open(fd, mode)
file.seek(pos)
self._file = tempfile._TemporaryFileWrapper(file, name, delete)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.