“é”來自哪個字符集？（Python：文件名帶“é”，如何使用os.path.exists、filecmp.cmp、shutil.move？）

Question

é來自哪個字符集？ 在 Windows 中，在 ANSI 文本文件中具有此字符的記事本可以很好地保存。 插入類似的東西你會得到一個錯誤。 é似乎在 Putty 的 ASCII 終端中工作正常（CP437 和 IBM437 是否相同？） 才不是。

我知道是 Unicode，不是 ASCII。 但是é是什么？ 它沒有給出我在記事本中使用 Unicode 得到的錯誤，但是 Python 拋出SyntaxError: Non-ASCII character '\xc3' in file on line, but no encoding declared; 在我按照Python NLTK: SyntaxError: Non-ASCII character '\xc3' in file (Sentiment Analysis -NLP) 的建議添加“魔術評論”之前。

我添加了“魔術注釋”並且沒有收到該錯誤，但是 os.path.isfile() 說帶有é的文件名不存在。 具有諷刺意味的是，字符é在錯誤鏈接到的 PEP 的作者Marc-André Lemburg中。

編輯：如果我打印文件的路徑，重音符號 e 顯示為├⌐但我可以將é復制並粘貼到命令提示符中。

EDIT2：見下文

Private    > cat scratch.py   ### LOL cat scratch :3
# coding=utf-8
file_name = r"Filéname"
file_name = unicode(file_name)
Private    > python scratch.py
Traceback (most recent call last):
  File "scratch.py", line 3, in <module>
    file_name = unicode(file_name)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 3: ordinal not in range(128)
Private    >

編輯3：

Private    > PS1="Private    > " ; echo code below ; cat scratch.py ; echo =======  ; echo output below ; python scratch.py
code below
# -*- coding: utf-8 -*-

file_name = r"Filéname"
file_name = unicode(file_name, encoding="utf-8")

# I have code here to determine a path depending on the hostname of the
# machine, the folder paths contain no Unicode characters, for my debug
# version of the script, I will hardcode the redacted hostname.
hostname = "One"
if hostname == "One":
    folder = "C:/path/folder_one"
elif hostname == "Two":
    folder = "C:/path/folder_two"
else:
    folder = "C:/path/folder_three"

path = "%s/%s" % (folder, file_name)
path = unicode(path, encoding="utf-8")


print path
=======
output below
Traceback (most recent call last):
  File "scratch.py", line 18, in <module>
    path = unicode(path, encoding="utf-8")
TypeError: decoding Unicode is not supported
Private    >

Answer 1

你需要告訴unicode字符串是什么編碼，在這種情況下它是utf-8而不是ascii ，文件 header 應該是# -*- coding: utf-8 -*- , Encoding Declarations

# -*- coding: utf-8 -*-
file_name = r"Filéname"
file_name = unicode(file_name, encoding="utf-8")

 1 Help on class unicode in module __builtin__: 2 3 class unicode(basestring) 4 | unicode(object='') -> unicode object 5 | unicode(string[, encoding[, errors]]) -> unicode object 6 | 7 | Create a new Unicode object from the given encoded string. 8 | encoding defaults to the current default string encoding. 9 | errors can be 'strict', 'replace' or 'ignore' and defaults to 'strict'.

正如我在之前的評論中提到的那樣，在具有 unicode 個字符的 Windows 文件系統上切換到 Python 3. Python 2 可能會是一場噩夢。

“é”來自哪個字符集？（Python：文件名帶“é”，如何使用os.path.exists、filecmp.cmp、shutil.move？）

問題描述

1 個解決方案

解決方案1
0 2020-05-12 01:23:54

“é”來自哪個字符集？ （Python：文件名帶“é”，如何使用os.path.exists、filecmp.cmp、shutil.move？）

問題描述

1 個解決方案

解決方案1 0 2020-05-12 01:23:54

“é”來自哪個字符集？（Python：文件名帶“é”，如何使用os.path.exists、filecmp.cmp、shutil.move？）

解決方案1
0 2020-05-12 01:23:54