简体   繁体   中英

read csv function to work in both python2 and python3 (unicode -vs bytes-like object)

We need to maintain a legacy application while it is migrated to python3 and rhel8.

we had thus to create a backwards compatible version of it.

there is a function that reads a csv.

in python3 we have this:

from io import StringIO
import csv

def read_csv(filename):
    """
    Sanitise and read CSV report
    """

    # lowest number of columns to expect in the header
    sane_columns = 7

    # temporary sanitised CSV
    stream = StringIO()

    with open(filename, encoding="utf-8") as csvfile:
        reader = csv.reader(csvfile)
        temp_writer = csv.writer(stream)
        for csv_row in reader:
            if len(csv_row) >= sane_columns:
                temp_writer.writerow(csv_row)

    # Move stream back to the start
    stream.seek(0)

    dict_reader = csv.DictReader(stream)

    return dict_reader

on python2 this gives the following error:

TypeError: unicode argument expected, got 'str'

we then change the code to work in python2:

from io import BytesIO
import csv

def read_csv(filename):
    """
    Sanitise and read CSV report
    """

    # lowest number of columns to expect in the header
    sane_columns = 7

    # temporary sanitised CSV
    stream = BytesIO()

    with open(filename) as csvfile:
        reader = csv.reader(csvfile)
        temp_writer = csv.writer(stream)
        for csv_row in reader:
            if len(csv_row) >= sane_columns:
                temp_writer.writerow(csv_row)

    # Move stream back to the start
    stream.seek(0)

    dict_reader = csv.DictReader(stream)

    return dict_reader

but on python3 it gives this error:

TypeError: a bytes-like object is required, not 'str'

how can we refactor the function that it will run on both version of python (2.7+ and 3.6+)

the csv which needs to be parsed has some garbage lines here is a sample:

some
garbage
lines


Client Name,Policy Name,Status Code,Job Start Time,Job End Time,Job Status,Schedule Name,Schedule Type
xxxxx,WN4_VMWARE_3M,0,"Nov 28, 2021 9:07:38 PM","Nov 28, 2021 9:38:38 PM",Successful,DI3M,Differential Incremental
yyyyyy,WN4_VMWARE_3M,0,"Nov 28, 2021 9:04:52 PM","Nov 28, 2021 9:30:38 PM",Successful,DI3M,Differential Incremental

as extra challenge. I cannot use the six library. not allowed to have pip package installed on the servers:(

I would use this approach to detect which version is installed and if is one version do something, and if it is not, do something else:

import sys
print(sys.version_info[0]) 
if sys.version_info[0] < 3:
    #block of code
else:
    #block of code

I'm unsure of the correct solution. However, we had once faced a similar problem wherein the encoding format we mentioned was "utf-8", but one of colleagues saved the file using Excel which converted the file into some other format and thereafter the second bug started popping up. Try saving the file in proper csv format. Peace!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM