简体   繁体   English

适用于Java的UTF-8字符串类

[英]UTF-8 String class for java

I need to hold lots of string objects in memory (hundreds of MB) and I want to hold them in UTF-8 format since in most cases it will require half of the memory the default implementation use. 我需要在内存中容纳很多字符串对象(数百MB),并且我想以UTF-8格式保存它们,因为在大多数情况下,它将需要默认实现使用的一半内存。
The default String class requires for a 12 characters string 60 bytes (See http://blog.griddynamics.com/2010/01/java-tricks-reducing-memory-consumption.html ). 默认的String类需要12个字符的字符串(60个字节)(请参见http://blog.griddynamics.com/2010/01/java-tricks-reducing-memory-consumption.html )。
Most of my Strings are 10-20 characters long. 我的大多数字符串都长10到20个字符。
I wonder if there is some open source library which offers a wrapper for such strings? 我想知道是否有一些开源库为此类字符串提供包装器?
I know how to convert String to UTF-8 byte array but I'm looking for a wrapper class which will provide all needed utilities functions (Hash, Equal, toString, fromString, etc). 我知道如何将String转换为UTF-8字节数组,但是我正在寻找一个包装器类,该包装器类将提供所有必需的实用程序功能(哈希,等于,toString,fromString等)。

Apache Avro has an UTF8 wrapper class which implements CharSequence , but I don't know the memory consumption of such objects Apache Avro具有实现CharSequenceUTF8包装器类 ,但我不知道此类对象的内存消耗

Hadoop has the Text class which has quite the kind of interface you desire Hadoop具有Text类该类具有您想要的那种接口

If you want a distinct object for each string and you want them as compact as possible then use byte arrays. 如果要为每个字符串使用不同的对象,并且希望它们尽可能紧凑,请使用字节数组。 That will be 1 byte per char vs 2, and you won't have the overhead of the String header (which adds probably 32 bytes per object). 这将是每个字符1个字节vs 2个字节,并且您将没有String标头的开销(每个标头可能会增加32个字节)。

But of course you wouldn't be able to use any String methods on these without first converting to String. 但是,当然,如果不先转换为String,就无法在这些方法上使用任何String方法。

But if you really want to save space, store the strings back-to-back in a few larger arrays, with "dope vectors" to locate the individual strings. 但是,如果您真的想节省空间,可以将字符串背靠背存储在一些较大的数组中,并使用“掺杂向量”定位各个字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM