[英]How does terminal encoding work in vim?
$ echo $LANG
en_US.UTF-8
$ echo 你好 | iconv -f UTF8 -t UTF32BE | tee hello.txt
O`Y}
$ vim -N -u NONE --cmd 'set tenc=utf32 enc=utf32 fencs=utf32be' hello.txt
你好
~
~
~
:set tenc enc fenc
termencoding=ucs-4
encoding=ucs-4
fileencoding=ucs-4
The terminal cannot display UTF32
characters. 终端无法显示
UTF32
字符。
After modifying several encoding options of Vim. 修改了Vim的几个编码选项后。
Vim still can display UTF32
without any problems. Vim仍然可以毫无问题地显示
UTF32
。
Why? 为什么?
Interesting. 有趣。 You can run your command inside
script
to verify that Vim is actually writing UTF-8 to your terminal. 您可以在
script
运行命令以验证Vim实际上是在向您的终端写入UTF-8。
The help for 'charconvert'
and 'encoding'
give oblique hints as to the internal operation, but I did not find a corresponding hint that this same behavior is applied to termencoding
. 'charconvert'
和'encoding'
的帮助给出了关于内部操作的倾斜提示,但我没有找到相应的提示,即相同的行为应用于termencoding
。 Respectively: 分别:
Vim internally uses UTF-8 instead of UCS-2 or UCS-4.
Vim内部使用UTF-8而不是UCS-2或UCS-4。
and 和
When "unicode", "ucs-2" or "ucs-4" is used, Vim internally uses utf-8.
当使用“unicode”,“ucs-2”或“ucs-4”时,Vim内部使用utf-8。
So, we will use the source (version 7.3.548, specifically) to find out what is happening. 因此,我们将使用源 (特别是版本7.3.548)来了解发生了什么。
The value for the termencoding
/ tenc
option is stored in the global variable p_tenc
. termencoding
/ tenc
选项的值存储在全局变量p_tenc
。
did_set_string_option()
seems to handle the setting of string-valued options. did_set_string_option()
似乎处理字符串值选项的设置。
When handling termencoding
, it calls convert_setup()
to setup output_conv
(for converting encoding
to termencoding
). 处理
termencoding
,它调用convert_setup()
来设置output_conv
(用于将encoding
转换为termencoding
)。
The comment for convert_setup
gives the first hint as to what is happening: convert_setup
的注释给出了关于发生了什么的第一个提示:
Note: cannot be used for conversion from/to ucs-2 and ucs-4 (will use utf-8 instead).
注意:不能用于从/到ucs-2和ucs-4的转换(将使用utf-8代替)。
convert_setup
calls convert_setup_ext()
with TRUE for both of the { from
, to
} _unicode_is_utf8
parameters. 对于{
from
, to
} _unicode_is_utf8
参数, convert_setup
调用 convert_setup_ext()
为TRUE。
from
, to
} _unicode_is_utf8
are true (they are), it sets the local variables { from
, to
} _is_utf8
based on whether the specified encodings have the ENC_UNICODE property ( ucs-4
does , as do all of Vim's utf-…
and ucs-…
encodings). from
, to
} _unicode_is_utf8
为真(它们是)时,它_is_utf8
根据指定的编码是否具有ENC_UNICODE属性( ucs-4
,以及所有Vim的utf-…
和utf-…
)来设置局部变量{ from
, to
} _is_utf8
utf-…
ucs-…
编码)。 iconv
, Vim substitutes utf-8
if { from
, to
} _is_utf8
are true (in this case, they are). iconv
,如果{ from
, to
} _is_utf8
为真(在这种情况下,它们是),Vim会替换utf-8
。 Ultimately, the values of encoding
and termencoding
are handled in the same way here. 最终,
encoding
和termencoding
的值在这里以相同的方式处理。 utf-32
is mapped to ucs-4
, which has ENC_UNICODE, and Vim substitutes the desired encoding with UTF-8. utf-32
映射到具有ENC_UNICODE的ucs-4
,Vim用UTF-8替换所需的编码。 Maybe there are some hints in the commit logs that indicate why termencoding
is treated this way; 也许提交日志中有一些提示表明为什么这样处理
termencoding
; I will leave that archeology to someone else, though. 不过,我会把那个考古学留给别人。
The code path for handling fileencoding
is different. 处理
fileencoding
的代码路径是不同的。 It only forces UTF-8 for the “internal side” of the conversion (and only if a “Unicode” encoding
is in effect). 它只强制UTF-8用于转换的“内部”(并且只有在“Unicode”
encoding
生效时)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.