简体繁体 English

字符编码问题

[英]Character encoding problem

原文 2010-03-05 21:47:55 7 1 c/ encoding/ character-encoding

I was recently editing a Unicode-encoded text file that also includes Thai characters (alongside "normal" characters). 我最近正在编辑一个Unicode编码的文本文件，其中还包括泰语字符（以及“普通”字符）。 For some reason, after each sequence of Thai characters, a new line appeared. 由于某种原因，在每个泰文字符序列之后，出现了新的一行。

After some mucking around with C, trying to remove all newline characters, I fired up vim to inspect the file. 在使用C进行一些修改之后，尝试删除所有换行符，我启动了vim来检查文件。 Apparently, after each Thai character sequence, there appears a "^M" string (without quotes). 显然，在每个泰语字符序列之后，都会出现一个“ ^ M”字符串（不带引号）。

Why is this happening, and what's that "^M"? 为什么会这样，那“ ^ M”是什么？ I've found that I can fix the problem by removing the last three characters from the Thai string, but there surely must be a more elegant way to fix this ... 我发现我可以通过从泰语字符串中删除最后三个字符来解决此问题，但是肯定有一种更优雅的方法可以解决此问题...

1 个解决方案

This has nothing to do with the fact that you have some Thai characters in the file. 这与文件中包含某些泰语字符的事实无关。 The ^M ('carrot M') is the representation of a Microsoft (DOS) carriage return. ^M （'carrot M'）是Microsoft（DOS）回车的表示。 Dos2unix the file to get rid of these before editing it in vim. 在Vim中编辑文件之前，请先删除Dos2unix文件，以摆脱这些困扰。