I have a case in which i am supposed to read a row inside a csv file and then try to find out if the first column in that row has valid utf-8 characters.
Below is a small sample data inside the csv file i have:
Pension Roob,"68233 Kertzmann Mountains Apt. 057, Swiftburgh, NY 18633"
ࠀabaa,"AECS layout main road"
Motel One,"23 Parkstad Germany"
I was expecting the second line to give an error but it is not happening.
Below is my Python code for doing that :
import csv
def is_valid_utf_8(word):
try:
check = word.encode('utf-8')
print(check)
except UnicodeEncodeError:
return False
return True
with open('test.csv') as csvfile:
rows = csv.reader(csvfile, delimiter=",")
for row in rows:
if len(row) == 0:
continue
else:
if not is_valid_utf_8(row[0]):
print(f"{row} has something wrong")
Is my way of checking for non UTF-8 characters right?
Or is the data sample that i am using is wrong.
Can someone please throw some light.
Many thanks in advance
Suggestion:
If you want to check whether the string is convertible, you should use ascii , instead of utf-8 . A fix on your is_valid_utf_8
method, which is is_valid_ascii
.
This way you get the coveted error and it checks what you would like to.
def is_valid_ascii(word):
try:
print(word)
check = word.encode('ascii')
except UnicodeEncodeError:
return False
return True
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.