简体   繁体   English

字符编码问题

[英]Character encoding problem

I was recently editing a Unicode-encoded text file that also includes Thai characters (alongside "normal" characters). 我最近正在编辑一个Unicode编码的文本文件,其中还包括泰语字符(以及“普通”字符)。 For some reason, after each sequence of Thai characters, a new line appeared. 由于某种原因,在每个泰文字符序列之后,出现了新的一行。

After some mucking around with C, trying to remove all newline characters, I fired up vim to inspect the file. 在使用C进行一些修改之后,尝试删除所有换行符,我启动了vim来检查文件。 Apparently, after each Thai character sequence, there appears a "^M" string (without quotes). 显然,在每个泰语字符序列之后,都会出现一个“ ^ M”字符串(不带引号)。

Why is this happening, and what's that "^M"? 为什么会这样,那“ ^ M”是什么? I've found that I can fix the problem by removing the last three characters from the Thai string, but there surely must be a more elegant way to fix this ... 我发现我可以通过从泰语字符串中删除最后三个字符来解决此问题,但是肯定有一种更优雅的方法可以解决此问题...

This has nothing to do with the fact that you have some Thai characters in the file. 这与文件中包含某些泰语字符的事实无关。 The ^M ('carrot M') is the representation of a Microsoft (DOS) carriage return. ^M ('carrot M')是Microsoft(DOS)回车的表示。 Dos2unix the file to get rid of these before editing it in vim. 在Vim中编辑文件之前,请先删除Dos2unix文件,以摆脱这些困扰。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM