简体   繁体   中英

How to convert rows with unicode data to latin-1 encoded csv data on Py2 and Py3?

I want to convert a nested list containing Unicode values to latin-1 encoded csv (so that I can transfer the result in a web response and have the end user's local Excel open the file).

We're transitioning to Py3, so preferably the same code needs to work for both Py2 and Py3 (for maintenance and coverage reasons).

Our Python 2 code that works (for py2):

from cStringIO import StringIO

def rows_to_csv_data(rows):
    rows = [[col.encode('latin-1') for col in row] for row in rows]
    buf = StringIO()
    writer = csv.writer(buf)
    writer.writerows(rows)
    return buf.getvalue()

A simple test case:

def test_rows_to_csv_data():
    rows = [
        [u'helloæ', u'worldø']
    ]
    binary_data = rows_to_csv_data(rows)
    assert binary_data == u"helloæ,worldø\r\n".encode('latin-1')

    # Update: the data is never written to a file, but sent with a web response:
    response = http.HttpResponse(content_type='text/csv')
    response['Content-Disposition'] = 'attachment; filename=hello.csv'
    response.write(binary_data)
    assert response.serialize() == b'Content-Type: text/csv\r\nContent-Disposition: attachment; filename=hello.csv\r\n\r\nhello\xe6,world\xf8\r\n'

I couldn't find any convenient way to do this using the future or six libraries.

Using from io import StringIO gives me (Py3):

Expected :b'hello\xe6,world\xf8\r\n'
Actual   :b'hello\\xe6',b'world\\xf8'\r\n

and Py2:

>       writer.writerows(rows)
E       TypeError: unicode argument expected, got 'str'

Using from io import BytesIO as StringIO works for Py2, but Py3 gives:

rows = [[b'hello\xe6', b'world\xf8']]

    def rows_to_csv_data(rows):
        rows = [[col.encode('latin-1') for col in row] for row in rows]
        buf = StringIO()
        writer = csv.writer(buf)
>       writer.writerows(rows)
E       TypeError: a bytes-like object is required, not 'str'

which is an error message I don't understand in this context...

Is it possible to write a single function that works for both Pythons, or do I need a completely separate function for Py3?

Here's an illustation of the differences between Python 2 and 3 that passes your test. Tested on Python 2.7 and Python 3.6.

#!coding:utf8
import io
import csv
import sys

def rows_to_csv_data(rows):
    if sys.version_info.major == 2:
        rows = [[col.encode('latin1') for col in row] for row in rows]
        buf = io.BytesIO()
    else:
        buf = io.StringIO(newline='')

    writer = csv.writer(buf)
    writer.writerows(rows)

    if sys.version_info.major == 2:
        return buf.getvalue()
    else:
        return buf.getvalue().encode('latin1')

def test_rows_to_csv_data():
    rows = [[u'helloæ', u'worldø']]
    binary_data = rows_to_csv_data(rows)
    assert binary_data == u"helloæ,worldø\r\n".encode('latin-1')

test_rows_to_csv_data()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM