简体   繁体   English

终端编码如何在vim中工作?

[英]How does terminal encoding work in vim?

In GNOME Terminal(3.4.1.1) 在GNOME终端(3.4.1.1)

$ echo $LANG
en_US.UTF-8

$ echo 你好 | iconv -f UTF8 -t UTF32BE | tee hello.txt
O`Y}

In vim(7.3): 在vim(7.3)中:

$ vim -N -u NONE --cmd 'set tenc=utf32 enc=utf32 fencs=utf32be' hello.txt
你好
~
~
~    

:set tenc enc fenc
  termencoding=ucs-4
  encoding=ucs-4
  fileencoding=ucs-4

The terminal cannot display UTF32 characters. 终端无法显示UTF32字符。
After modifying several encoding options of Vim. 修改了Vim的几个编码选项后。
Vim still can display UTF32 without any problems. Vim仍然可以毫无问题地显示UTF32
Why? 为什么?

Interesting. 有趣。 You can run your command inside script to verify that Vim is actually writing UTF-8 to your terminal. 您可以在script运行命令以验证Vim实际上是在向您的终端写入UTF-8。

The help for 'charconvert' and 'encoding' give oblique hints as to the internal operation, but I did not find a corresponding hint that this same behavior is applied to termencoding . 'charconvert''encoding'的帮助给出了关于内部操作的倾斜提示,但我没有找到相应的提示,即相同的行为应用于termencoding Respectively: 分别:

Vim internally uses UTF-8 instead of UCS-2 or UCS-4. Vim内部使用UTF-8而不是UCS-2或UCS-4。

and

When "unicode", "ucs-2" or "ucs-4" is used, Vim internally uses utf-8. 当使用“unicode”,“ucs-2”或“ucs-4”时,Vim内部使用utf-8。

So, we will use the source (version 7.3.548, specifically) to find out what is happening. 因此,我们将使用 (特别是版本7.3.548)来了解发生了什么。

The value for the termencoding / tenc option is stored in the global variable p_tenc . termencoding / tenc选项的值存储在全局变量p_tenc

  • did_set_string_option() seems to handle the setting of string-valued options. did_set_string_option()似乎处理字符串值选项的设置。

    • When handling termencoding , it calls convert_setup() to setup output_conv (for converting encoding to termencoding ). 处理termencoding ,它调用convert_setup()来设置output_conv (用于将encoding转换为termencoding )。

      The comment for convert_setup gives the first hint as to what is happening: convert_setup的注释给出了关于发生了什么的第一个提示:

      Note: cannot be used for conversion from/to ucs-2 and ucs-4 (will use utf-8 instead). 注意:不能用于从/到ucs-2和ucs-4的转换(将使用utf-8代替)。

      • convert_setup calls convert_setup_ext() with TRUE for both of the { from , to } _unicode_is_utf8 parameters. 对于{ fromto } _unicode_is_utf8参数, convert_setup 调用 convert_setup_ext()为TRUE。

        • When { from , to } _unicode_is_utf8 are true (they are), it sets the local variables { from , to } _is_utf8 based on whether the specified encodings have the ENC_UNICODE property ( ucs-4 does , as do all of Vim's utf-… and ucs-… encodings). 当{ fromto } _unicode_is_utf8为真(它们是)时,它_is_utf8根据指定的编码是否具有ENC_UNICODE属性( ucs-4 ,以及所有Vim的utf-…utf-… )来设置局部变量{ fromto } _is_utf8 utf-… ucs-…编码)。
          When it comes time to open an iconv , Vim substitutes utf-8 if { from , to } _is_utf8 are true (in this case, they are). 打开iconv ,如果{ fromto } _is_utf8为真(在这种情况下,它们是),Vim会替换utf-8

Ultimately, the values of encoding and termencoding are handled in the same way here. 最终, encodingtermencoding的值在这里以相同的方式处理。 utf-32 is mapped to ucs-4 , which has ENC_UNICODE, and Vim substitutes the desired encoding with UTF-8. utf-32映射到具有ENC_UNICODE的ucs-4 ,Vim用UTF-8替换所需的编码。 Maybe there are some hints in the commit logs that indicate why termencoding is treated this way; 也许提交日志中有一些提示表明为什么这样处理termencoding ; I will leave that archeology to someone else, though. 不过,我会把那个考古学留给别人。

The code path for handling fileencoding is different. 处理fileencoding的代码路径是不同的。 It only forces UTF-8 for the “internal side” of the conversion (and only if a “Unicode” encoding is in effect). 它只强制UTF-8用于转换的“内部”(并且只有在“Unicode” encoding生效时)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM