在Python中從CSV文件讀取/編碼中文字符

Question

我正在嘗試讀取包含簡體中文信息的CSV文件，並將其編碼為放入數據庫的請求。

我的代碼部分：

#coding:utf-8    
import csv, sys, urllib, urllib2

with open('testdata1.csv', 'rU') as f:
    reader = csv.reader(f)
    try:
        z = csv.reader(f, delimiter='\t')
        for row in reader:
            print row[0]
            if row[0] in (None, ""): 
                continue
            elif row[0] == '家長姓': 
                print row[0]

但是我遇到了兩個問題：

1）Sublime Text無法理解漢字，也就是說在命令elif row[0] == '家長姓'尋找'家長姓'是不明白elif row[0] == '家長姓' 。

2）Sublime Text似乎無法打印中文字符（當我告訴它打印一些信息時，所有中文字符都被下划線替換）。

我已經嘗試過File> Save with Encoding> UTF-8無濟於事。 任何幫助，將不勝感激。

Answer 1

嘗試使用具有適當編碼的codecs打開文件：

>>> import codecs
>>> f = codecs.open("testdata1.csv", "r", "utf-8")

Answer 2

非ASCII字符總是難以使用，因為有3個不同的問題：

系統和編輯器必須能夠顯示它們
必須指定源文件的編碼（ # -*- coding: ... -*-在第一行或第二行）
以上所有內容都與系統編碼無關（ sys.encoding將用於渲染它們）

首先，您編碼行忘記了-*- ，這意味着某些編輯器可能無法正確處理編碼。

您還可以嘗試IDLE編輯器是否更容易處理中文字符。

但無論如何，如果其他每個都失敗了，你總是可以使用顯式的unicode代碼：

>>> txt = u'家長姓' # only works if editor and interpretor were correctly declared the source encoding
>>> txt2 = u'\xe5\xae\xb6\xe9\x95\xbf\xe5\xa7\x93' # works on any system
>>> txt == txt2
True

TL / DR：如果在Python源代碼中使用非ASCII字符時遇到問題，請使用其轉義代碼

Answer 3

您的代碼中的'家長姓'是<type 'str'> ，您讀取的內容也是<type 'str'> type'str <type 'str'> ，但也許它們的編碼方法不一樣。您可以將它們解碼為<type 'unicode'>比較前的<type 'unicode'> 。

例如：

row[0].decode('utf-8') == u'家長姓'

這是關於str和unicode的測試：

test = '你好'
test1 = u'你好'
print type(test)
print type(test1)
print test == test1
print type(test.decode('utf-8'))
print test.decode('utf-8') == test1

輸出：

<type 'str'>
<type 'unicode'>
False
<type 'unicode'>
True

在Python中從CSV文件讀取/編碼中文字符

問題描述

3 個解決方案

解決方案1
1 2015-12-11 09:00:47

解決方案2
1 2015-12-11 10:15:56

解決方案3
1 2017-06-27 02:53:34

在Python中從CSV文件讀取/編碼中文字符

問題描述

3 個解決方案

解決方案1 1 2015-12-11 09:00:47

解決方案2 1 2015-12-11 10:15:56

解決方案3 1 2017-06-27 02:53:34

解決方案1
1 2015-12-11 09:00:47

解決方案2
1 2015-12-11 10:15:56

解決方案3
1 2017-06-27 02:53:34