字符解码转换功能实现

Question

I need to implement a character encoding conversion function in C++ or C( Most desired ) from a custom encoding scheme( to support multiple languages in single encoding ) to UTF-8. 我需要在C ++或C（最需要的）中实现从自定义编码方案（以单一编码支持多种语言）到UTF-8的字符编码转换功能。

Our encoding is pretty random , it looks like this 我们的编码是随机的，看起来像这样

Because of the randomness of this mapping, I am thinking to use std::map for mapping our encoding to UTF and vice versa in two different maps ,and use this maps for conversion. 由于此映射的随机性，我正在考虑使用std :: map在两个不同的映射中将我们的编码映射到UTF，反之亦然，并使用此映射进行转换。 Is their any optimized data structure or way to do it. 是他们进行任何优化的数据结构或方式。

Answer 1

If your code points are contiguous, just make a big char * array and translate using that. 如果您的代码点是连续的，则只需创建一个大char *数组并使用该数组进行翻译即可。 I don't really understand what you mean by UTF-8 codepoint. 我不太了解UTF-8代码点的含义。 UTF-8 has representations, and Unicode has codepoints. UTF-8具有表示形式，而Unicode具有代码点。 If you want code points, use an array of ints. 如果需要代码点，请使用一个整数数组。

const int mycode_to_unicode [] = {
   0x00ff,
   0x0102,
   // etc.
 };

You could put a value like -1 if there are holes in your encoding to catch errors. 如果编码中存在漏洞以捕获错误，则可以输入类似-1的值。

Going the other way is just making an array of structs of the same size of something like 换种方式只是制作一个大小相同的结构数组

struct {
   int mycode;
   int unicode;
};

copying the keys of the array into mycode and the values into unicode, and running it through qsort with a function which compares the values of unicode , then using bsearch with the same function to go from code point to your encoding. 将数组的键复制到mycode中，然后将值复制到unicode ，然后使用带有比较unicode值的函数的qsort运行它，然后将bsearch与同一个函数一起使用，从代码点转到您的编码。

This is assuming you want to use C. 这是假设您要使用C。

Answer 2

An hashtable would surely be the fastest solution. 哈希表肯定是最快的解决方案。

If a table is known upfront and never changes (as I understand it's the case), you can determine a perfect hash for it meaning that you will have no collision and assured costant retrieve time (at the expense of possibily some space). 如果一个表是预先知道的并且永远不会改变（据我所知是这样），则可以为其确定一个完美的哈希，这意味着您将不会发生冲突并且可以保证代价高昂的检索时间（这可能会浪费一些空间）。

I've used gperf a couple of times but I suggest you to check Bob Jenkins great page on hashing (and minimal perfect hashing as well) 我已经使用过gperf几次，但我建议您检查Bob Jenkins关于哈希的出色页面（以及最小完美哈希）

Answer 3

As you build the constant mappings upfront and use it only for lookups, a hash table might be more ideal than std::map. 当您预先构建常量映射并将其仅用于查找时，哈希表可能比std :: map更理想。 There is no hash table implementation in the C++ standard, but many free implementations are available, both in C and C++. C ++标准中没有哈希表实现，但是在C和C ++中都可以使用许多免费实现。

These are C implementations: 这些是C实现：

http://www.cl.cam.ac.uk/~cwc22/hashtable/ http://www.cl.cam.ac.uk/~cwc22/hashtable/

http://wiki.portugal-a-programar.org/c:snippet:hash_table_c http://wiki.portugal-a-programar.org/c:snippet:hash_table_c

Glibc hash tables . Glibc哈希表。

Answer 4

Not sure if I understand the question, but if it's not too big a 1:1 mapping , using a preinitialized struct may be the way to go (depending on the code, you could write a program to once emit the content of the init table): 不知道我是否理解这个问题，但是如果1：1映射不是太大，那么使用预初始化的结构可能是可行的方法（取决于代码，您可以编写一个程序来一次发出init表的内容）：

struct MAP { int from, to; };

MAP somemapping[MAXMAP]= {
    { 0x101,  0x01 },
    { 0x102,  0x02 },

};

Using bsearch() would be a reasonably quick way to do lookups; 使用bsearch（）将是进行查找的一种相当快速的方法。

If the code is extremely performance senstitive, you could build an index based lookup table: 如果代码对性能非常敏感，则可以构建基于索引的查找表：

int lookup[65536];


/* init build lookup table once */
init() 
{
  for (int i= 0; i<MAXMAP; i++) {
     lookup[somemapping[i].from]= somemapping[i].to;
  }
}



foo() 
{
  ....
   /* quick lookup */
  to= lookup[from];
  ....
}

字符解码转换功能实现

问题描述

4 个解决方案

解决方案1
2 已采纳

解决方案2
2 2009-11-17 12:54:00

解决方案3
1 2009-11-17 12:33:23

解决方案4
1 2009-11-17 12:52:27

字符解码转换功能实现

问题描述

4 个解决方案

解决方案1 2 已采纳

解决方案2 2 2009-11-17 12:54:00

解决方案3 1 2009-11-17 12:33:23

解决方案4 1 2009-11-17 12:52:27

解决方案1
2 已采纳

解决方案2
2 2009-11-17 12:54:00

解决方案3
1 2009-11-17 12:33:23

解决方案4
1 2009-11-17 12:52:27