简体   繁体   English

带有奇怪字符的Python unicode文件名

[英]Python unicode file name with strange characters

I am having problems with unicode in Python2.7.我在 Python2.7 中遇到 unicode 问题。 The thing is that I get from database some data and store it in a variable called country with the value u"Espa\\xf1a".问题是我从数据库中获取了一些数据并将其存储在一个名为 country 的变量中,其值为 u"Espa\\xf1a"。

If I go to the shell and write the following:如果我去shell并写下以下内容:

>>>country
>>>u"Espa\xf1a"
>>>print country
>>>España

That's ok.没关系。 No problem with that.没有问题。 The problem comes when I try to create a file called España.txt as follows:当我尝试创建一个名为 España.txt 的文件时出现问题,如下所示:

>>> country = u"Espa\xf1a"
>>> file = "%s.txt" % country
>>> file
u'Espa\xf1a.txt'
>>> print file
España.txt
>>> os.system("touch %s" % file)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf1' in position 10: ordinal not in range(128)

I don't know why this is happening.我不知道为什么会这样。 Could anyone help me?有人可以帮助我吗? Thanks in advance!提前致谢!

Could well be that your operating system isn't allowing you to create the file.很可能是您的操作系统不允许您创建该文件。 Instead of using touch to create the file, try the python way with.不要使用 touch 来创建文件,而是尝试使用 python 方式。

f = open(file, 'w')
...
f.close()

I am assuming that you are trying to write to the file, and you want the file to be called 'España.txt'.我假设您正在尝试写入该文件,并且您希望该文件被称为“España.txt”。

os.system("touch %s" % file)

The POSIX command line and filesystem are a natively byte-based environment, Unicode strings are not available there. POSIX 命令行和文件系统是本机基于字节的环境,Unicode 字符串在那里不可用。 Non-ASCII characters are encoded into filenames and commands using some encoding, which can vary from system to system (though on modern Linux it would typically be UTF-8).非 ASCII 字符使用某种编码编码到文件名和命令中,这可能因系统而异(尽管在现代 Linux 上通常是 UTF-8)。

sys.getfilesystemencoding() will give you Python's best guess of what encoding is in use on the local filesystem (if you mount other filesystems all bets are off), from variables that hopefully defined in the environment. sys.getfilesystemencoding()将根据希望在环境中定义的变量,为您提供 Python 对本地文件系统上使用的编码的最佳猜测(如果您挂载其他文件系统,则所有赌注都关闭)。

You should never call os.system including variables in the command.你不应该在命令中调用os.system包括变量。 If there are unexpected characters in the variable they can end up executing arbitrary commands, with disastrous security consequences.如果变量中有意外字符,它们最终可能会执行任意命令,从而带来灾难性的安全后果。

You can use interfaces like subprocess.call(['touch', filename.encode(sys.getfilesystemencoding())]) to take care of the necessary argument escaping, but in general you should avoid launching an external command for anything like touch that you can do directly from Python.您可以使用subprocess.call(['touch', filename.encode(sys.getfilesystemencoding())])来处理必要的参数转义,但通常您应该避免为诸如touch任何东西启动外部命令你可以直接从 Python 做。

For example:例如:

open(filename, 'wb').close()

(When you open a Unicode filename, Python encodes the name to the default filesystemencoding for you.) (当您open Unicode 文件名时,Python 会将名称编码为默认的文件系统编码。)

Try this: print ("Espa\ña") .试试这个: print ("Espa\ña") That should print España .那应该打印España

尝试: os.system("touch %s" % file.encode('utf-8'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM