繁体   English   中英

Python 3 UnicodeDecodeError:“ascii”编解码器无法解码字节 0xc2

[英]Python 3 UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2

我知道关于这个主题有一些问题,但我无法得到我正在寻找的答案。 所以我还是会问的。 我是初学者:)

我有这个简单的功能:

f =[]
def extract_row():
    with open('country_codes.txt') as infile:
        for line in infile:
            x = (line.split()[0])
            f.append(x)
        print (f)
extract_row()

它在 python 2.7 上运行,因此我可以获得所需的信息。

['AD', 'AE', 'AF', 'AG', 'AI', 'AL', 'AM', 'AN', 'AO', 'AQ', 'AR'...

但是当我尝试在 python 3.4 上运行它时,我得到了这个错误:

Traceback (most recent call last):
  File "/Users/juanlozano/Documents/geonames/extractRow.py", line 8, in <module>   
    extract_row()
  File "/Users/juanlozano/Documents/geonames/extractRow.py", line 4, in extract_row
    for line in infile:
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position     31: ordinal not in range(128). 

有没有人可以提供一些关于它的信息?

这些是我正在使用的 txt 文件中的一些行:在此处输入图像描述

我在 Google 云端硬盘中对您的图片进行了 OCR 处理。 不完美但足以复制:

AD AND 20 AN Andorra Andorra la Vella 468. 0 84 EU
AE ARE 784 AE United Arab Emirates Abu Dhabi 82,880.0 4,975, 593 AS
AF AFG 4 AF Afghanistan Kabul 647, 500.0 29, 121,286 AS
AG ATG 28 AC Antigua and Barbuda St. John's 443.0 86,754 NA
AI AIA 660 AV Anguilla The Valley 102.0 13, 254 NA
ALE 8 AL Albania Tirana 28,748,0 2,986, 952 EU
ARM 51 AM Armenia Yerevan 29,800.0 2,968,000 AS
ANT 530 NT Willemstad 960. 0 136, 197 NA 24 A0 Angola Luanda 1,246,700.0 13,068,161 AF
AQ 10 AY Antarctica 14,000,000.0 0 AN
AR B2 AR Argentina Buenos Aires 2,766, 890. 0 41,343, 201 SA
AS 16 AQ American Samoa Pago Pago 199.0 57,881 0C
AT 40 AU Austria Vienna 83,858.0 8,205,000 EU
AU AUS 36 AS Australia Canberra 7,686,850.0 21,515,754 OC
AW AA Aruba Oranjestad 193.0 71,566 NA
AX Åland Mariehamn 1,580.0 26,711 EU
AZ AJ Azerbaijan Baku 86,600.0 8,303,512 AS
BA BK Bosnia and Herzegovina Sarajevo 51, 129.0 4,590,000 EU
BB BB Barbados Bridgetown 431. 0 285,653 NA
BD BG Bangladesh Dhaka 144,000.0 156,118,464 AS
BE BE Belgium Brussels 30,510.0 10,403,000 EU
BF UV Burkina Faso Ouagadougou 274,200.0 16, 241, 811 AF
BG BU Bulgaria Sofia 110,910.0 7, 148,785 EU
BH BA Bahrain Manama | 665.0 738,004 AS
BI BY Burundi Bujumbura 27,830.0 9,863, 117 AF
BJ EN Benin Porto-Novo 112,620.0 9,056,010 AF
BL TB Saint Barthélemy Gustavia 21. 0 8, 45 NA
EM BD Bermuda Hamilton 53.0 65,365 NA
BN BX Brunei Bandar Seri Begawan 5,770.0 395,027 AS
B0 BL Bolivia Sucre 1,098,580,0 9,947, 418 SA
BQ Bonaire_328.0 18,012 NA

然后我输入了您的代码,并添加了encoding='ascii' ,如下所示:

f =[]
def extract_row():
    with open('country_codes.txt',encoding='ascii') as infile:
         for line in infile:
             x = (line.split()[0])
             f.append(x)
         print (f)

extract_row()

并得到错误UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 763: ordinal not in range(128)

因此,我得出结论,Python 出于某种原因认为您的源文件是 ascii 编码的。 首先通过运行sys.getdefaultencoding()进行检查。 您知道源文件的正确编码吗? 尝试更改打开文件行中的编码(例如,按照上面的建议更改encoding=utf-8iso8859 ),看看是否有帮助。

使用codecs库来解决这个问题。 用这个替换你的读取文件代码段:

with codecs.open('country_codes.txt','r','utf-8') as infile:

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM