简体   繁体   中英

Git messes up with non-ascii characters on Linux container

I have a .Net Core (C#) project with the following line in one of the classes:

var input = "£";

But when I do a git clone in a Docker container ( microsoft/dotnet:2.2-sdk ) it messes it up and displays it as (in bash using cat ).

And when I run it, its Utf-8 bytes are [239, 191, 189] = [EF, BF, BD] which seem to be a so-called Unicode replacement character .

Windows editor that I use is VS 2017, but character is displayed properly on other windows machines and parsed properly by dotnet run/test command, so I don't think this is a problem of failing to save the character incorrectly.

Any ideas why I am seeing such a mess and how to solve it?

Some details

  • I get bytes using Encoding.UTF8.GetBytes("£");
  • It works perfectly well on Windows 10 machine
  • Linux version Debian GNU/Linux 9 (stretch) from the cat /etc/os-release
  • locale -a returns C C.UTF-8 POSIX
  • On Windows Notepad++, when opened, is claims to be ANSI and is displayed correctly.

Running fgrep 'var input' file.cs | od -tx1 -c fgrep 'var input' file.cs | od -tx1 -c

0000100  76  61  72  20  69  6e  70  75  74  20  3d  20  22  a3  22  3b
          v   a   r       i   n   p   u   t       =       " 243   "   ;

Your file contains a single byte a3 which corresponds to the Windows-1252 encoding for the character £ . Your Linux system displays because it is not a valid UTF-8 encoding.

You should configure Visual Studio to use UTF-8 instead of Windows-1252.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM