简体   繁体   中英

Talk about character encoding the python2/python3 files default use in PyCharm

All we know, if we code in the PyCharm. If we use python2, the .py head of its content, as a habit, we will add #-*-coding:utf-8 -*- to let the .py files content code as the utf-8 encoding.

I have a question, if we do not give the header, which encoding is the .py using in pycharm? does that relate to pycharm?

And if we create .html file, we can set the encoding by its head tag:

<meta charset="UTF-8">

but how about the plain file?

Does the plain file use the default encoding?

And if we use .py files in python3, whether the effect is equals to add the #-*-coding:utf-8 -*- line in the python2 .py files?

After read some relevant informations, I get the bellow explain.

First of all, there are many character sets:

ASCII , GB2312 , GBK , Unicode , UTF-8 and others.

The ASCII character set can can not contains/compatible with all of the other countries' character(such as Chinese, Japanese, Korean), because only 128 bit (0x80-0xff) reserved for extension to support each country's specific characters, and they are also not international standard, lead to a software installation in other countries easy to has mess code. So, after many years development, there comes Unicode that can contains all the characters in the world. But Unicode memory occupation is large, like Latin alphabet only need one byte can express it, but Chinese need 3 bytes, so there comes variable length character set --- UTF-8.

Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a much wider array of characters and their various encoding forms have begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments. While ASCII is limited to 128 characters, Unicode and the UCS support more characters by separating the concepts of unique identification (using natural numbers called code points) and encoding (to 8-, 16- or 32-bit binary formats, called UTF-8, UTF-16 and UTF-32).

In PyCharm, if project python interpreter is the Python2, the file default character set is ASCII (you have to add #-*-coding:utf-8 -*- in your Pycharm Preferences -> Editor -> File and Code Templates -> Python Script to avoid the character coding issues in your project), and if project python interpreter is python3, the character set is UTF-8, no need to set that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM