简体繁体 English

大型键的最佳C＃集合

[英]Best c# collection for large keys

原文 2010-08-16 11:02:18 0 3 c#/ asp.net/ collections

I am developing a multilingual ASP.NET web site. 我正在开发一个多语言的ASP.NET网站。 I want the users to be able to add translations 'on the fly' therefore I am storing the translations in a database and maintaining an ASP.NET Application object containing all the translations (for performance). 我希望用户能够“即时”添加翻译，因此我将翻译存储在数据库中，并维护包含所有翻译的ASP.NET应用程序对象（以提高性能）。 I am using a simple Hashtable to store the translations, then I store this in the Application object and reference it when rendering the page. 我使用一个简单的哈希表存储翻译，然后将其存储在Application对象中，并在呈现页面时引用它。

Some of the translation entries are short, some are long. 有些翻译条目很短，有些则很长。 It all depends upon what the user wants translated. 这完全取决于用户要翻译的内容。 Whenever the site is rendering, say, control labels or help text, it checks the Hashtable to see if it contains a translation for the English version and if it does then it uses the Hashtable version (assuming a foreign-language user - I use separate Application-level objects for each language). 无论何时网站渲染，例如控件标签或帮助文本，它都会检查哈希表以查看其是否包含英语版本的翻译，如果包含，则使用哈希表版本（假设使用外语用户-我使用单独的语言每种语言的应用程序级对象）。

A typical Hashtable entry might be: key='Things to do today', value='Mon Charge pour aujourd'hui'. 典型的哈希表条目可能是：key =“今天要做的事情”，value =“ Mon Charge pour aujourd'hui”。 The code looks up the table for 'Things to do today' - if a translation is found it uses it, if not it just uses the English version. 代码在表格中查找“今天要做的事情”-如果找到翻译，它将使用它，否则，它将仅使用英语版本。

My question is this: is a Hashtable the most performant collection for this? 我的问题是这样的：Hashtable是否为此功能最强大？ Could I somehow improve performance/memory usage by using a different collection/key structure, or even using a different mechanism altogether? 我可以通过使用不同的集合/密钥结构，甚至完全使用不同的机制，以某种方式改善性能/内存使用情况吗？ My thinking is that I'd rather use the Application object rather than doing distinct database reads for each translation - there may be tens of translations per page render. 我的想法是，我宁愿使用Application对象，也不愿对每个翻译进行不同的数据库读取-每个页面呈现可能有数十个翻译。

3 个解决方案

I suggest an alternative approach - create a "script" which will convert translations from your custom source (db or whatever you have) into .NET resource files, insert a command which will run it into Before-Build event of your project. 我建议一种替代方法-创建一个“脚本”，将您的自定义源（数据库或任何您拥有的源）的翻译转换成.NET资源文件，并插入一个命令，将其运行到项目的Before-Build事件中。 This way you will be able to use native localization functionality of .NET platform and still be able to store translations in any place you want. 这样，您将能够使用.NET平台的本机本地化功能，并且仍然能够将翻译存储在所需的任何位置。

As Iain Galloway commented: 正如伊恩·加洛韦（ Iain Galloway）所说：

It's almost always worth going with the grain of the platform, rather than against it. 几乎总是值得考虑平台的细节，而不是反对它。

如果您想快速访问，可以尝试增加Trie

I think that memory is not so much of an issue given the number of bytes you always have to store is more or less constant. 我认为，考虑到您始终必须存储的字节数或多或少是恒定的，内存并不是问题。 The keyvalue mapping size is equivalent, no matter what mechanism you use. 无论使用哪种机制，键值映射大小都是等效的。

However, for performance this does not hold. 但是，对于性能而言，这并不成立。 When using a keyvalue collection (like eg hashtable or dictionary), the hash that is used in the dictionary is calculated using the input. 当使用键值集合（例如哈希表或字典）时，使用输入来计算字典中使用的哈希。 Ie if your key is "abcdefg" it is hashed using the GetHashcode() function. 即，如果您的密钥是“ abcdefg”，则使用GetHashcode（）函数对其进行哈希处理。 This function obviously uses more computational power if your input to gethashcode (ie the input string abcdefg) is longer. 如果您对gethashcode的输入（即输入字符串abcdefg）较长，则此函数显然会使用更多的计算能力。 You can for example make the gethashcode a constant function O(1) in terms of the length of the string by overriding the function and let it return a hashcode based on a constant number of input characters of your string. 例如，您可以通过覆盖gethashcode以字符串长度的形式将其设为常量函数O（1），然后让该gethashcode根据常量的输入字符串数返回哈希码。 Of course this could brake the balance of your hashtable, resulting in slower lookups. 当然，这可能会破坏哈希表的平衡，从而导致查找速度变慢。 You should benchmark this to be sure. 您应该对此进行基准测试以确保。

Another solution is to keep a dictionaries/hashtables for each language and use as key some short abbriviation. 另一个解决方案是为每种语言保留一个字典/哈希表，并使用一些简短的缩写作为关键字。 On lookup you will then have a switch statement which picks the correct dictionary and extracts the string using the abbrivated key. 在查找时，您将有一个switch语句，该语句选择正确的字典并使用附加键提取字符串。 The downside of this is that you increase memory usage, but in theory decrease lookup time. 这样做的缺点是增加了内存使用量，但从理论上讲减少了查找时间。 Also in this case you have to benchmark in order to be sure if there is any (positive) difference. 同样在这种情况下，您必须进行基准测试以确保是否存在任何（正）差异。

As as sidenote, this sounds to me as a premature optimization. 作为旁注，这在我看来是过早的优化。 I dont think that such lookups will be a bottleneck to you application. 我认为这样的查询不会成为您应用程序的瓶颈。