简体   繁体   中英

What does '^@' mean in vim?

When I cat a file in bash I get the following:

$ cat /tmp/file 
microsoft

When I view the same file in vim I get the following:

^@m^@i^@c^@r^@o^@s^@o^@f^@t^@

How can I identify and remove these "non-printable" characters. What does '^@' mean in vim??

(Just a piece of background information: the file was created by base 64 decoding and cutting from the pssh header of an mpd file for Microsoft Playready)

What you see is Vim's visual representation of unprintable characters . It is explained at :help 'isprint' :

 Non-printable characters are displayed with two characters: 0 - 31 "^@" - "^_" 32 - 126 always single characters 127 "^?" 128 - 159 "~@" - "~_" 160 - 254 "| " - "|~" 255 "~?" 

Therefore, ^@ stands for a null byte = 0x00. These (and other non-printable characters) can come from various sources, but in your case it's an ...

encoding issue

If you clearly observe your output in Vim, every second byte is a null byte ; in between are the expected characters. This is a clear indication that the file uses a multibyte encoding ( utf-16 , big endian, no byte order mark to be precise), and Vim did not properly detect that, and instead opened the file as latin1 or so (whereas things worked out properly in the terminal).

To fix this, you can either explicitly specify the encoding:

:edit ++enc=utf-16 /tmp/file

Or tweak the 'fileencodings' option, so that Vim can automatically detect this. However, be aware that ambiguities (as in your case) make this prone to fail:

For an empty file or a file with only ASCII characters most encodings will work and the first entry of 'fileencodings' will be used (except "ucs-bom", which requires the BOM to be present).

That's why a byte order mark (BOM) is recommended for 16-bit encodings; but that assumes that you have control over the output encoding.

^@ is Vim's representation of a null byte. The ^ indicates a non-printable control character, with the following ASCII character indicating which control character it is.

^@ == 0 (NUL)
^A == 1
^B == 2
...
^H == 8
^K == 11
...
^Z == 26
^[ == 27
^\ == 28
^] == 29
^^ == 30
^_ == 31
^? == 127

9 and 10 aren't escaped because they are Tab and Line Feed respectively.

32 to 126 are printable ASCII characters (starting with Space).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM