I have a text file which contains accented characters such as: 'č', 'š', 'ž'. When I read this file with a Python program and put the file content into a Python list the accented characters are lost, Python replaces them with other characters. For example: 'č' is replaced by '_'. Does anyone know how I can keep the accented characters in a Python program, when I read them from a file? My code:
import sqlite3 #to work with relational DB
conn = sqlite3.connect('contacts.sqlite') #connect to db
cur = conn.cursor() #db connection handle
cur.execute("DROP TABLE IF EXISTS contacts")
cur.execute("CREATE TABLE contacts (id INTEGER, name TEXT, surname TEXT, email TEXT)")
fname = "acos_ibm_notes_contacts - test.csv"
fh = open(fname) #file handle
print " "
print "Reading", fname
print " "
#--------------------------------------------------
#First build a Python list with new contacts data: name, surname and email address
lst = list() #temporary list to hold content of the file
new_contact_list = list() #this list will contain contatcs data: name, surname and email address
count = 0 # to count number of contacts
id = 1 #will be used to add contacts id into the DB
for line in fh: #for every line in the file handle
new_contact = list()
name = ''
surname = ''
mail = ''
#split line into tokens at each '"' character and put tokens into the temporary list
lst = line.split('"')
if lst[1] == ',': continue #if there is no first name, move to next line
elif lst[1] != ',': #if 1st element of list is not empty
name = lst[1] #this is the name
if name[-1] == ',': #If last character in name is ','
name = name[:-1] #delete it
new_contact.append({'Name':name}) #add first name to new list of contacts
if lst[5] != ',': #if there is a last name in the contact data
surname = lst[5] #assign 5th element of the list to surname
if surname[0] == ',': #If first character in surname is ','
surname = surname[1:] #delete it
if surname[-1] == ',': #If last character in surname is ','
surname = surname[:-1] #delete it
if ',' in surname: #if surname and mail are merged in same list element
sur_mail = surname.split(',') #split them at the ','
surname = sur_mail[0]
mail = sur_mail[1]
new_contact.append({'Surname':surname}) #add last name to new list of contacts
new_contact.append({'Mail':mail}) #add mail address to new list of contacts
new_contact_list.append(new_contact)
count = count + 1
fh.close()
#--------------------------------------------------
# Second: populate the DB with data from the new_contact_list
row = cur.fetchone()
id = 1
for i in range(count):
entry = new_contact_list[i] #every row in the list has data about 1 contact - put it into variable
name_dict = entry[0] #First element is a dictionary with name data
surname_dict = entry[1] #Second element is a dictionary with surname data
mail_dict = entry[2] #Third element is a dictionary with mail data
name = name_dict['Name']
surname = surname_dict['Surname']
mail = mail_dict['Mail']
cur.execute("INSERT INTO contacts VALUES (?, ?, ?, ?)", (id, name, surname, mail))
id = id + 1
conn.commit() # Commit outstanding changes to disk
import io
fh = io.open("notes_contacts.csv", encoding="utf_16_le") #file handle
lst = list() #temporary list to hold content of the file
new_contact_list = list() #this list will contain the contact name, surname and email address
count = 0 # to count number of contacts
id = 1 #will be used to add contacts id into the DB
for line in fh: #for every line in the file handle
print "Line from file:\n", line # print it for debugging purposes
new_contact = list()
name = ''
surname = ''
mail = ''
#split line into tokens at each '"' character and put tokens into the temporary list
lst = line.split('"')
if lst[1] == ',': continue #if there is no first name, move to next line
elif lst[1] != ',': #if 1st element of list is not empty
name = lst[1] #this is the name
print "Name in variable:", name # print it for debugging purposes
if name[-1] == ',': #If last character in name is ','
name = name[:-1] #delete it
new_contact.append({'Name':name}) #add first name to new list of contacts
if lst[5] != ',': #if there is a last name in the contact data
surname = lst[5] #assign 5th element of the list to surname
print "Surname in variable:", surname # print it for debugging purposes
if surname[0] == ',': #If first character in surname is ','
surname = surname[1:] #delete it
if surname[-1] == ',': #If last character in surname is ','
surname = surname[:-1] #delete it
if ',' in surname: #if surname and mail are merged in same list element
sur_mail = surname.split(',') #split them at the ','
surname = sur_mail[0]
mail = sur_mail[1]
new_contact.append({'Surname':surname}) #add last name to new list of contacts
new_contact.append({'Mail':mail}) #add mail address to new list of contacts
new_contact_list.append(new_contact)
print "New contact within the list:", new_contact # print it for debugging purposes
fh.close()
Aco,"",Vidovič,aco.vidovic@si.ibm.com,+38613208872,"",+38640456872,"","","","","","","","",""
In Python 2.7, the default file mode is binary. Instead, you need to open the file in a text mode and have the text decoded as it in Python 3. You don't have to decode text when reading a file but it saves you from having to worry about encodings later in your code.
Add to the top:
import io
Change:
fh = io.open(fname, encoding='utf_16_le')
Note: You always need to pass in the encoding
as Python can't natively guess the encoding.
Now, every time you read()
, the text will be converted to a Unicode string.
The SQLite module accepts TEXT as Unicode or UTF-8 encoded str. As you've already decoded your text to Unicode you don't have to do anything else.
To ensure that SQLite doesn't try to encode the main body of your SQL command back to an ASCII string, change the SQL command to a Unicode string by appending a u
to the string.
Eg
cur.execute(u"INSERT INTO contacts VALUES (?, ?, ?, ?)", (id, name, surname, mail))
Python 3 will help you avoid some of these quirks and you'll simply need to do the following to make it work:
fh = io.open(fname, encoding='utf_16_le')
As your data looks like standard Excel dialect CSV, then you can use the CSV modules to split your data. The DictReader allows you to pass the column names, which makes it ultra easy to parse your fields. Unfortunately, Python's 2.7 CSV module is not Unicode-safe so you need to use the Py3 backport: https://github.com/ryanhiebert/backports.csv
Your code can be simplified to:
from backports import csv
import io
csv_fh = io.open('contacts.csv', encoding='utf_16_le')
field_names = [u'first_name', u'middle_name', u'surname', u'email',
u'phone_office', u'fax', u'phone_mobile', u'inside_leg_measurement']
csv_reader = csv.DictReader(csv_fh, fieldnames=field_names)
for row in csv_reader:
if not row['first_name']: continue
print u"First Name: {first_name}, " \
u"Surname: {surname} " \
u"Email: {email}".format(first_name=row['first_name'],
surname=row['surname'],
email=row['email'])
尝试在代码程序的第一行使用# coding=utf-8
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.