简体   繁体   中英

Special chars in Python

i have to use special chars in my python-application. For example: ƃ I have information like this:

U+0183 LATIN SMALL LETTER B WITH TOPBAR

General Character Properties

In Unicode since: 1.1
Unicode category: Letter, Lowercase

Various Useful Representations

UTF-8: 0xC6 0x83
UTF-16: 0x0183

C octal escaped UTF-8: \306\203
XML decimal entity: &# 387;

But when i just pust symbols into python-script i get an error:

Non-ASCII character '\\xc8' ... How can i use it in strings for my application?

You should tell the interpreter which encoding you're using, because apparently on your system it defaults to ascii. See PEP 263 . In your case, place the following at the top of your file:

# -*- coding: utf-8 -*-

Note that you don't have to write exactly that; PEP 263 allows more freedom, to accommodate several popular editors which use slightly different conventions for the same purpose. Additionally, this string may also be placed on the second line, eg when the first is used for the shebang .

While the answers so fare are all correct, I thought I'd provide a more complete treatment:

The simplest way to represent a non-ASCII character in a script literal is to use the u prefix and u or U escapes, like so:

print u"Look \u0411\u043e\u0440\u0438\u0441, a G-clef: \U0001d11e"

This illustrates:

  1. using the u prefix to make sure the string is a unicode object
  2. using the u escape for characters in the basic multi-lingual plane (U+FFFD and below)
  3. using the U escape for characters in other planes (U+10000 and above)
  4. that Ƃ (U+0182 LATIN CAPITAL LETTER B WITH TOPBAR) and Б (U+0411 CYRILLIC CAPTIAL LETTER BE) just one example of many confusingly similar Unicode codepoints

The default script encoding for Python that works everywhere is ASCII. As such, you'd have to use the above escapes to encode literals of non-ASCII characters. You can inform the Python interpreter of the encoding of your script with a line like:

# -*- coding: utf-8 -*-

This only changes the encoding of your script. But then you could write:

print u"Look Борис, a G-clef: "

Note that you still have to use the u prefix to obtain a unicode object, not a str object.

Lastly, it is possible to change the default encoding used for str ... but this not recommended, as it is a global change and may not play well with other python code.

Do you store the Python file as UTF-8? Does your editor support UTF-8? Are you using unicode strings like so?

foo = u'ƃƃƃƃƃ'

Declare Unicode strings.

somestring = u'æøå'

In python it should be

u"\u0183"

The u before the String start indicates that the String contains Unicode characters.

See here for reference:

http://www.fileformat.info/info/unicode/char/0183/index.htm http://docs.python.org/tutorial/introduction.html#unicode-strings

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM