简体   繁体   English

如何在 Vim 中正确显示 UTF-8 字符

[英]How to display UTF-8 characters in Vim correctly

I want/need to edit files with UTF-8 characters in it and I want to use Vim for it.我想要/需要编辑其中包含 UTF-8 字符的文件,并且我想使用 Vim 来完成它。 Before I get accused of asking something that was asked before, I've read the Vim documentation on encoding, fileencoding[s], termencoding and more, googled the subject, and read this question among other texts.在我被指责问以前有人问过的问题之前,我已经阅读了有关编码、文件编码[s]、术语编码等的 Vim 文档,搜索了这个主题,并在其他文本中阅读了这个问题

Here is a sentence with a UTF-8 character in it that I use as a test case.这是一个包含 UTF-8 字符的句子,我将其用作测试用例。

From Japanese 勝 (katsu) meaning "victory"

If I open the (UTF-8) file with Notepad it is displayed correct.如果我用记事本打开(UTF-8)文件,它会显示正确。 When I open it with Vim, the best thing I get is a black square where the Japanese character for katsu should be.当我用 Vim 打开它时,我得到的最好的东西是一个黑色的正方形,里面应该是 katsu 的日文字符。 Changing any of the settings for fileencoding or encoding does not make a difference.更改文件编码或编码的任何设置都不会产生影响。

Why is Vim giving me a black square where Notepad is displaying it without problems?为什么 Vim 给我一个黑色方块,而记事本显示它没有问题? If I copy the text from Vim with copy/paste to Notepad it is displayed correctly, indicating that the text is not corrupted but displayed wrong.如果我使用复制/粘贴将文本从 Vim 复制到记事本,它会正确显示,表明文本没有损坏但显示错误。 But what setting(s) have influence on that?但是什么设置对此有影响?

Here is the relevant part of my _vimrc:这是我的 _vimrc 的相关部分:

if has("multi_byte")
  set encoding=utf-8
  if &termencoding == ""
    let &termencoding = &encoding
  endif
  setglobal fileencoding=utf-8
  set fileencodings=ucs-bom,utf-8,latin1
endif

The actual settings when I open the file are:我打开文件时的实际设置是:

encoding=utf-8
fileencoding=utf-8
termencoding=utf-8

My PC is running Windows 10, language is English (United States).我的电脑运行的是 Windows 10,语言是英语(美国)。

This is what the content of the file looks like after loading it in Vim and converting it to hex:这是文件内容在 Vim 中加载并转换为十六进制后的样子:

0000000: efbb bf46 726f 6d20 4a61 7061 6e65 7365  ...From Japanese
0000010: 20e5 8b9d 2028 6b61 7473 7529 206d 6561   ... (katsu) mea
0000020: 6e69 6e67 2022 7669 6374 6f72 7922 0d0a  ning "victory"..

The first to bytes is the Microsoft BOM magic, the rest is just like ASCII except for the second, third and fourth byte on the second line, which must represent the non-ASCII character somehow.第一个字节是 Microsoft BOM 魔术,其余的就像 ASCII,除了第二行的第二、第三和第四个字节,它必须以某种方式表示非 ASCII 字符。

There are two steps to make Vim successfully display a UTF-8 character:让 Vim 成功显示一个 UTF-8 字符有两个步骤:

  1. File encoding.文件编码。 You've correctly identified that this is controlled by the 'encoding' and 'fileencodings' options.您已经正确地确定这是由'encoding''fileencodings'选项控制的。 Once you've properly set this up (which you can verify via :setlocal filenencoding? , or the ga command on a known character, or at least by checking that each character is represented by a single cell, not its constituent byte values), there's:正确设置后(可以通过:setlocal filenencoding?或已知字符上的ga命令进行验证,或者至少通过检查每个字符是否由单个单元格表示,而不是由其组成的字节值表示),有:
  2. Character display.字符显示。 That is, you need to use a font that contains the UTF-8 glyphs.也就是说,您需要使用包含 UTF-8 字形的字体。 UTF-8 is large; UTF-8 很大; most fonts don't contain all glyphs.大多数字体不包含所有字形。 In my experience, that's less of a problem on Linux, which seems to have some automatic fallbacks built in. But on Windows, you need to have a proper font installed and configured (gVim: in guifont ).根据我的经验,这在 Linux 上问题不大,它似乎内置了一些自动回退。但在 Windows 上,您需要安装和配置正确的字体(gVim:在guifont中)。

For example, to properly display Japanese Kanji characters, you need to install the far eastern language support in Windows, and then例如,要正确显示日文汉字字符,您需要在 Windows 中安装远东语言支持,然后

:set guifont=MS_Gothic:h12:cSHIFTJIS

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM