简体   繁体   中英

Is there a string type with 8 BIT chars?

I need to store much strings in RAM. But they do not contain special unicode characters, they all contains only characters from "ISO 8859-1" that is one byte.

Now I could convert every string, store it in memory and convert it back to use it with .Contains() and methods like this, but this would be overhead (in my opinion) and slow.

Is there a string class that is fast and reliable and offers some methods of the original string class like .Contains()?

I need this to store more strings in memory with less RAM used. Or is there an other way to do it?

Update:

Thank you for your comments and your answer.

I have a class that stores string. Then with one method call I need to figure out if I already have that string in memory. I have about 1000 strings to figure out if they are in the list a second . hundred of millions in total. The average size of the string is about 20 chars. It is really the RAM that cares me.

I even thought about compress some millions of strings and store these packages in memory. But then I need to decompress it every time I need to access the values.

I also tried to use a HashSet, but the needed memory amount was even higher.

I don't need the true value. Just to know if the value is in the list. So if there is a hash-value that can do it, even better. But all I found need more memory than the pure string.

Currently there is no plan for further internationalization. So it is something I would deal with when it is time to :-)

I don't know if using a database would solve it. I don't need to fetch anything, just to know if the value was stored in the class. And I need to do this fast.

It is very unlikely that you will win any significant performance from this. However, if you need to save memory, this strategy may be appropriate.

  • To convert a string to a byte[] for this purpose, use Encoding.Default.GetBytes() [1] .

  • To convert a byte[] back to a string for display or other string-based processing, use Encoding.Default.GetString() .

  • You can make your code look nicer if you use extension methods defined on string and byte[] . Alternatively, you can wrap the byte[] in a wrapper type and put the methods there. Make this wrapper type a struct , not a class , otherwise it will incur extra heap allocations, which is what you're trying to avoid.

I want to warn you, though — you are throwing away the ability to have Unicode in your application. You should normally have all alarm bells go off every time you think you need to do this. It is best if you structure your code in such a way that you can easily go back to using string when memory sizes will have gone up and memory consumption stops being an issue.


[1] Encoding.Default returns the current 8-bit codepage of the running operating system. The default for this on English-language Windows is Windows-1252, which is what you want. For Russian Windows it will be Windows-1251 (Cyrillic) etc.

As per comments, a basically bad idea. If you have to do it, byte[] is your friend. There is no byte-oriented string class in .NET.

Checkout the string.Intern method, that could help you out:

http://www.yoda.arachsys.com/csharp/strings.html

http://en.csharp-online.net/CSharp_String_Theory%E2%80%94String_intern_pool

However looking at your requirements, I think you are over engineering it. You have 1000 strings at 20 chars = 1000 * 20 * 2 = 40,000 bytes, that's not much memory.

If you really have a large amount, store it in a DB with an index. That would be much faster than anything the average programmer can come up with.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM