简体   繁体   English

讨论在PyCharm中默认使用的python2 / python3文件的字符编码

[英]Talk about character encoding the python2/python3 files default use in PyCharm

All we know, if we code in the PyCharm. 我们所知道的,如果我们使用PyCharm进行编码。 If we use python2, the .py head of its content, as a habit, we will add #-*-coding:utf-8 -*- to let the .py files content code as the utf-8 encoding. 如果我们习惯使用python2(其内容的.py头)作为习惯,则将添加#-*-coding:utf-8 -*-以使.py文件的内容代码为utf-8编码。

I have a question, if we do not give the header, which encoding is the .py using in pycharm? 我有个问题,如果不给标题,pycharm中的.py使用哪种编码? does that relate to pycharm? 这与pycharm有关吗?

And if we create .html file, we can set the encoding by its head tag: 而且,如果我们创建.html文件,则可以通过其head标签设置编码:

<meta charset="UTF-8">

but how about the plain file? 但是普通文件呢?

Does the plain file use the default encoding? 普通文件是否使用默认编码?

And if we use .py files in python3, whether the effect is equals to add the #-*-coding:utf-8 -*- line in the python2 .py files? 并且如果我们在python3中使用.py文件,效果是否等于在python2 .py文件中添加#-*-coding:utf-8 -*-行?

After read some relevant informations, I get the bellow explain. 阅读了一些相关信息后,我得到了下面的说明。

First of all, there are many character sets: 首先,有许多字符集:

ASCII , GB2312 , GBK , Unicode , UTF-8 and others. ASCIIGB2312GBKUnicodeUTF-8等。

The ASCII character set can can not contains/compatible with all of the other countries' character(such as Chinese, Japanese, Korean), because only 128 bit (0x80-0xff) reserved for extension to support each country's specific characters, and they are also not international standard, lead to a software installation in other countries easy to has mess code. ASCII字符集不能包含/兼容其他所有国家/地区的字符(例如中文,日文,韩文),因为只有128位(0x80-0xff)保留用于扩展以支持每个国家/地区的特定字符,它们是也没有国际标准,导致在其他国家的软件安装中容易产生乱码。 So, after many years development, there comes Unicode that can contains all the characters in the world. 因此,经过多年的发展,出现了可以包含世界上所有字符的Unicode。 But Unicode memory occupation is large, like Latin alphabet only need one byte can express it, but Chinese need 3 bytes, so there comes variable length character set --- UTF-8. 但是Unicode内存占用很大,像拉丁字母只需要一个字节就可以表示它,而中文需要3个字节,因此就出现了可变长度的字符集--- UTF-8。

Unicode and the ISO/IEC 10646 Universal Character Set (UCS) have a much wider array of characters and their various encoding forms have begun to supplant ISO/IEC 8859 and ASCII rapidly in many environments. Unicode和ISO / IEC 10646通用字符集(UCS)具有广泛得多的字符数组,并且它们的各种编码形式已开始在许多环境中迅速取代ISO / IEC 8859和ASCII。 While ASCII is limited to 128 characters, Unicode and the UCS support more characters by separating the concepts of unique identification (using natural numbers called code points) and encoding (to 8-, 16- or 32-bit binary formats, called UTF-8, UTF-16 and UTF-32). 虽然ASCII限制为128个字符,但是Unicode和UCS通过分离唯一标识(使用称为代码点的自然数)和编码(8位,16位或32位二进制格式,称为UTF-8)的概念来支持更多字符。 ,UTF-16和UTF-32)。

In PyCharm, if project python interpreter is the Python2, the file default character set is ASCII (you have to add #-*-coding:utf-8 -*- in your Pycharm Preferences -> Editor -> File and Code Templates -> Python Script to avoid the character coding issues in your project), and if project python interpreter is python3, the character set is UTF-8, no need to set that. 在PyCharm中,如果项目python解释器为Python2,则文件默认字符集为ASCII(您必须在Pycharm偏好设置->编辑器->文件和代码模板->中添加#-*-coding:utf-8 -*- Python脚本以避免项目中的字符编码问题),并且如果项目python解释器为python3,则字符集为UTF-8,则无需进行设置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM