How to read a csv file and then find out if a particular field in the file contains valid utf-8 characters in Python

Question

I have a case in which i am supposed to read a row inside a csv file and then try to find out if the first column in that row has valid utf-8 characters.

Below is a small sample data inside the csv file i have:

Pension Roob,"68233 Kertzmann Mountains Apt. 057, Swiftburgh, NY 18633"
ࠀabaa,"AECS layout main road"
Motel One,"23 Parkstad Germany"

I was expecting the second line to give an error but it is not happening.

Below is my Python code for doing that :

import csv

def is_valid_utf_8(word):
    try:
        check = word.encode('utf-8')
        print(check)
    except UnicodeEncodeError:
        return False
    return True


with open('test.csv') as csvfile:
    rows = csv.reader(csvfile, delimiter=",")
    for row in rows:
        if len(row) == 0:
            continue
        else:
            if not is_valid_utf_8(row[0]):
                print(f"{row} has something wrong")

Is my way of checking for non UTF-8 characters right?

Or is the data sample that i am using is wrong.

Can someone please throw some light.

Many thanks in advance

Answer 1

Suggestion:

If you want to check whether the string is convertible, you should use ascii , instead of utf-8 . A fix on your is_valid_utf_8 method, which is is_valid_ascii .

This way you get the coveted error and it checks what you would like to.

def is_valid_ascii(word):
    try:
        print(word)
        check = word.encode('ascii')
    except UnicodeEncodeError:
        return False
    return True

How to read a csv file and then find out if a particular field in the file contains valid utf-8 characters in Python

Question

1 answers

solution1
1 ACCPTED 2018-10-21 08:46:43

How to read a csv file and then find out if a particular field in the file contains valid utf-8 characters in Python

Question

1 answers

solution1 1 ACCPTED 2018-10-21 08:46:43

solution1
1 ACCPTED 2018-10-21 08:46:43