繁体   English   中英

如何在Python中显示Unicode字符

[英]How to display a Unicode character in Python

我有一个包含带重音符号的文本文件,例如:“č”,“š”,“ž”。 当我使用Python程序读取此文件并将文件内容放入Python列表时,重音字符会丢失,Python会将其替换为其他字符。 例如:“č”替换为“ _”。 当我从文件中读取带重音符号的字符时,有人知道如何将其保留在Python程序中吗? 我的代码:

import sqlite3 #to work with relational DB

conn = sqlite3.connect('contacts.sqlite') #connect to db 
cur = conn.cursor() #db connection handle

cur.execute("DROP TABLE IF EXISTS contacts")

cur.execute("CREATE TABLE contacts (id INTEGER, name TEXT, surname  TEXT, email TEXT)")

fname = "acos_ibm_notes_contacts - test.csv"
fh = open(fname) #file handle
print " "
print "Reading", fname
print " "

#--------------------------------------------------
#First build a Python list with new contacts data: name, surname and email address

lst = list() #temporary list to hold content of the file
new_contact_list = list() #this list will contain contatcs data: name, surname and email address
count = 0 # to count number of contacts
id = 1 #will be used to add contacts id into the DB
for line in fh: #for every line in the file handle
    new_contact = list()
    name = ''
    surname = ''
    mail = ''
    #split line into tokens at each '"' character and put tokens into  the temporary list
    lst = line.split('"')
    if lst[1] == ',': continue #if there is no first name, move to next line
    elif lst[1] != ',': #if 1st element of list is not empty
        name = lst[1] #this is the name
        if name[-1] == ',': #If last character in name is ','
        name = name[:-1] #delete it
        new_contact.append({'Name':name}) #add first name to new list of contacts
        if lst[5] != ',': #if there is a last name in the contact data
            surname = lst[5] #assign 5th element of the list to surname
            if surname[0] == ',': #If first character in surname is ','
                surname = surname[1:] #delete it
            if surname[-1] == ',': #If last character in surname is ','
                surname = surname[:-1] #delete it
            if ',' in surname: #if surname and mail are merged in same list element
                sur_mail = surname.split(',') #split them at the ','
                surname = sur_mail[0]
                mail = sur_mail[1]
            new_contact.append({'Surname':surname}) #add last name to new list of contacts
            new_contact.append({'Mail':mail}) #add mail address to new list of contacts
        new_contact_list.append(new_contact)
    count = count + 1

fh.close()
#--------------------------------------------------
# Second: populate the DB with data from the new_contact_list

row = cur.fetchone()
id = 1
for i in range(count):
    entry = new_contact_list[i] #every row in the list has data about 1 contact - put it into variable
    name_dict = entry[0] #First element is a dictionary with name data
    surname_dict = entry[1] #Second element is a dictionary with surname data
    mail_dict = entry[2] #Third element is a dictionary with mail data
    name = name_dict['Name']
    surname = surname_dict['Surname']
    mail = mail_dict['Mail']
    cur.execute("INSERT INTO contacts VALUES (?, ?, ?, ?)", (id, name, surname, mail))
    id = id + 1               

conn.commit() # Commit outstanding changes to disk 

-----------------------------------

这是程序的简化版本,没有数据库,仅打印到屏幕上

import io
fh = io.open("notes_contacts.csv", encoding="utf_16_le") #file handle

lst = list() #temporary list to hold content of the file
new_contact_list = list() #this list will contain the contact name,    surname and email address
count = 0 # to count number of contacts
id = 1 #will be used to add contacts id into the DB
for line in fh: #for every line in the file handle
    print "Line from file:\n", line # print it for debugging purposes
    new_contact = list()
    name = ''
    surname = ''
    mail = ''
    #split line into tokens at each '"' character and put tokens into  the temporary list
    lst = line.split('"')
    if lst[1] == ',': continue #if there is no first name, move to next line
    elif lst[1] != ',': #if 1st element of list is not empty
        name = lst[1] #this is the name
        print "Name in variable:", name # print it for debugging purposes
        if name[-1] == ',': #If last character in name is ','
            name = name[:-1] #delete it
            new_contact.append({'Name':name}) #add first name to new list of contacts
        if lst[5] != ',': #if there is a last name in the contact data
            surname = lst[5] #assign 5th element of the list to surname
            print "Surname in variable:", surname # print it for debugging purposes
            if surname[0] == ',': #If first character in surname is ','
                surname = surname[1:] #delete it
            if surname[-1] == ',': #If last character in surname is ','
                surname = surname[:-1] #delete it
            if ',' in surname: #if surname and mail are merged in same list element
                sur_mail = surname.split(',') #split them at the ','
                surname = sur_mail[0]
                mail = sur_mail[1]
            new_contact.append({'Surname':surname}) #add last name to new list of contacts
            new_contact.append({'Mail':mail}) #add mail address to new list of contacts
        new_contact_list.append(new_contact)
        print "New contact within the list:", new_contact # print it for debugging purposes

fh.close()

这是notes_contacts.csv文件的内容,只有1行:

Aco,"",Vidovič,aco.vidovic@si.ibm.com,+38613208872,"",+38640456872,"","","","","","","","",""

在Python 2.7中,默认文件模式是二进制。 取而代之的是,您需要以文本模式打开文件并在Python 3中对其进行解码。在读取文件时不必解码文本,但是它使您不必担心以后代码中的编码。

添加到顶部:

import io

更改:

 fh = io.open(fname, encoding='utf_16_le')

注意:您始终需要传递encoding因为Python本身无法猜测编码。

现在,每次您read() ,文本都将转换为Unicode字符串。

SQLite模块接受TEXT作为Unicode或UTF-8编码的str。 由于您已经将文本解码为Unicode,因此您无需执行其他任何操作。

为了确保SQLite不会尝试将SQL命令的主体编码回ASCII字符串,请通过在字符串后附加u来将SQL命令更改为Unicode字符串。

例如

cur.execute(u"INSERT INTO contacts VALUES (?, ?, ?, ?)", (id, name, surname, mail))

Python 3将帮助您避免某些怪癖,而您只需执行以下操作即可使其工作:

fh = io.open(fname, encoding='utf_16_le')

由于您的数据看起来像标准的Excel方言CSV,因此您可以使用CSV模块拆分数据。 DictReader允许您传递列名,这使解析字段变得非常容易。 不幸的是,Python的2.7 CSV模块不是Unicode安全的,因此您需要使用Py3反向端口: https : //github.com/ryanhiebert/backports.csv

您的代码可以简化为:

from backports import csv
import io

csv_fh = io.open('contacts.csv', encoding='utf_16_le')

field_names = [u'first_name', u'middle_name', u'surname', u'email',
               u'phone_office', u'fax', u'phone_mobile', u'inside_leg_measurement']

csv_reader = csv.DictReader(csv_fh, fieldnames=field_names)

for row in csv_reader:
    if not row['first_name']: continue

    print u"First Name: {first_name}, " \
          u"Surname: {surname} " \
          u"Email: {email}".format(first_name=row['first_name'],
                                   surname=row['surname'],
                                   email=row['email'])

尝试在代码程序的第一行使用# coding=utf-8

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM